Learners' Languages

Home , Free category, Monoidal functor

arXiv:2103.01189v1 [math.CT] 1 Mar 2021 morphism ec eainadisedcnev of conceive instead and relation lence https://www.youtube.com/watch?v=ji8MHKlQZ9w of composite concep be learning—can functor monoidal deep in used backpropagation—as and n akrpgto.Here, explicitly backpropagation. developed and category a learners, of that to spaces aeois tsends It categories. nteppr“akrpa uco”[ functor” as “Backprop paper the In Introduction 1 hr h aaeeiigobject parameterizing the where u ihhmst hticueaprmtrzn object parameterizing a include that hom-sets with but 1 h qiaec relation equivalence The nany on aaee paeadbcpoaain twsso realized soon was It e developed isomorphism category backpropagation. a and learners, update of parameter that to spaces Euclidean fplnma ucosi n aibe i h functor the via variable, one in functors polynomial of programming functional in used as lenses simple of category or ahns hs nefc steitra-o type internal-hom the is interface whose (more machines) systems Moore dynamical of terms in interpretation natural that togmnia functor monoidal strong a b backpropagation—can and descent learning—gradient deep ocueb icsigsm ietosfrftr work. future as for descent directions gradient some give discussing We by conclude language. internal its in stated ial,w eiwtefc httecategory the that fact the review we Finally, nti oe eosrethat observe we note, this In n“akrpa uco” h uhr hwta h fundamen the that show authors the functor”, as “Backprop In , ( Poly : ? ? 2 ։ 1 ∈ , ⊗ ⊗) Poly ? Para ? ! ′ smnia lsd eso htamap a that show we closed, monoidal is 1 Learn : with → ( Para om oo,adcnie h oia rpstosta ca that propositions logical the consider and topos, a forms C ( C , , 2 )( 2 5 ( and 2 Euc ⊗) ∼ = 1 eres languages Learners’ 2 , Para sgnrtdb regarding by generated is , oactgr ihtesm bet Ob objects same the with category a to Para Para → ) 2 2 # ) 2 Para 5 ( ≔ ⊗ ′ SLens ? sBuoGvaoi onsoti eetpresentation, recent a in out points Gavranović Bruno As . ( ? samndo h aeoyo ymti monoidal symmetric of category the on monad a is Learn ai .Spivak I. David Euc scniee pt neuvlnerelation equivalence an to up considered is {( ( 2 SLens C ,5 ?, → ) FST19 → ) sabctgr.Ti swa esald swell. as do shall we what is This bicategory. a as ) Abstract where , 2 rmtectgr fprmtrzdeuclidean parameterized of category the from | ) 3 safl uctgr of subcategory full a is Learn 1 ti fe rfrbet ipnewt h equiva- the with dispense to preferable often is it , in ? ,teatosso htgain descent gradient that show authors the ], Para ∈ C SLens rmtectgr fparameterized of category the from 5 , ( C ? : ) ( - ,5 ?, a aaeeiigobject parameterizing has 2 Coalg 1 stesmercmonoidal symmetric the is ⊗ ( ∼ ) → ocpueprmtrupdate parameter capture to ? 7→ fdnmclsystems dynamical of → [ rcsl,generalized precisely, ? in neape n we and example, an ′ . y piil ocapture to xplicitly ocpulzdas conceptualized e 5 , 2 y Poly Para 2 ′ , }/∼ ) htteei an is there that sn h fact the Using . ulzda strong a as tualized a lmnsof elements tal fteeeit nepi- an exists there if y h category the , ( SLens Para ] . ( C ) a a has ) be n ≔ ∼ ? . Ob 1 1 ⊗ The C ? 2 , and is given by the ordinary composite

21 ⊗ (?1 ⊗ ?2)→ 22 ⊗ ?2 → 23.

The domain of the backpropagation functor, Para(Euc), is thus the Para construction applied to the Cartesian monoidal category of Euclidean spaces R= and smooth maps. But it was soon realized that Learn is in fact also given by a Para construction, namely there is an isomorphism Learn Para(SLens), where SLens is the symmetric monoidal category of simple lenses as used in functional programming. The objects of SLens are sets Ob(SLens) = Ob(Set), but a morphism consists of a pair of functions

♯ ♯ SLens(,) ≔ {( 51, 5 ) | 51 : → , 5 : × → }. (1)

Thus a map → in Para(SLens) consists of a set % and functions 51 : × % → and 5 ♯ : × × % → × %. The authors of [FST19] developed this structure in order to conceptualize the compositional nature of deep learning as comprising a parameterizing set % (often called “the space of weights and biases”) and three functions:

: × % → implement * : × × % → % update ' : × × % → request (2)

The implement function is a %-parameterized function → , and the update and requestfunctions take a pair (0, 1) of “training data” and both updates the parameter— e.g. by gradient descent—and returns an element of the input space , which is used to train another such function in the network. This last step—the request—is not just found in deep learning as practiced, but is in fact crucial for deﬁning composition.2 But by this point, the notation (%,,*,') of Learn has become heavy and the structure seems to be getting lost. Even knowing Learn Para(SLens) seems ad-hoc since the morphisms (1) of SLens are—to this point—mathematically unmotivated. This is where Poly comes in. In this note, we observe that SLens is a full subcategory of Poly, the category of polynomial functors in one variable, via the functor 7→ y. Using the fact that (Poly, ⊗) is monoidal closed, we will reconceptualize Para(SLens) in terms of polynomial coalgebras, which can be understood as dynamical systems: machines with states that can be observed as “output” and updated based on “input”. In particular, a morphism → in Learn will be recast as a coalgebra on the internal hom polynomial [y,y], and we will explain this in terms of dynamics. This viewpoint allows us to substantially generalize the construction in Learn, a construction which also appears prominently in the theory of open games [Gha+16].

2The reader can check that given only (%,,*): → and (&, ,+): → , one can construct a composed parameter set % × &, and one can construct a composed implement function × % × & → , but one cannot construct an associative update operation × × % × & → % × &. In order to get it, one needs the request function × × & → . By endowing morphisms with the request function, as in Learn (2), composition and a monoidal structure is easily deﬁned.

2 Perhaps more interestingly, it allows us to use we the fact that the category ?-Coalg of dynamical systems on any interface ? ∈ Poly forms a topos. A topos is a setting in which one can do dependent type theory and higher-order logic. In fact, the topos of ?-coalgebras is in some ways as simple as possible: it is not only a copresheaf topos ?-Coalg SetC for a certain category C, but in fact the site C is the free category on a directed graph that we’ll call Tree?. This makes the logic of ?-coalgebras—and hence of dynamical systems, learners, and game-players—quite simple. However, the particular graph Tree? associated to ? is highly-structured, and we should ﬁnd that this structure is inherited by the internal language of ?-Coalg. The point is to consider logical propositions that can be stated in the internal language of ?-Coalg and to use these propositions in order to constrain the behavior of learners and game-players (categoriﬁed as discussed above), and of interaction patterns between dynamical systems more generally. For example, “gradient descent and backpropagation” is a property we can express in the internal language.

Plan for the paper

In Section 2 we will discuss various relevant constructions in the category Poly of polynomial functors in one variable. In particular, we will review its symmetric monoidal closed structure (Poly, y, ⊗, [−, −]), its composition monoidal structure (y, ⊳), and the notion of coalgebras. We explain how morphisms in Learn can be phrased in terms of coalgebras on internal hom objects, and we reconstruct Para(SLens) in these terms. In Section 3, we ﬁrst show that the category ?-Coalg of ?-coalgebras for the endo- functor ? : Set → Set is apresheaftopos. We then discuss theinternal logicof ?-Coalg, and we conclude by giving several directions for future work.

Notation

We denote the category of sets by Set; we generally denote sets with upper-case letters ,, etc. Given a natural number # ∈ N, we write N ≔ {1,...,#}, so 0 = , 1 = {1}, 2 = {1, 2}, etc. Given sets ,, we often write ≔ × to denote their Cartesian product. We will denote polynomials with lower-case letters, ?, @, etc.

Acknowledgments

We thank David A. Dalrymple, Dai Girardo, Paul Kreiner, David Jaz Myers, and Alex Zhu for useful conversations. We also acknowledge support from AFOSR grant FA9550-20-10348.

3 2 Constructions in Poly

In this section we review the category Poly, for which [GK12] is an excellent reference; we also discuss its symmetric monoidal closed structure. Then we discuss polynomial coalgebras and reconceptualize the category Learn and Gavranović’s bicategorical variant, in that language.

2.1 Background on Poly as a monoidal closed category

For any set , let y : Set → Set be the functor represented by ; that is, y applied to a set ( is Set(,() (. In particular, y ≔ y1 is (isomorphic to) the identity functor ( 7→ ( and 1 ≔ y0 is the constant functor ( 7→ 1. Note that y(1) 1 1 for any . The coproduct of functors and , denoted + , is taken pointwise; this means there is a natural isomorphism

( + )(() (()+ (() where the coproduct (()+ (() is taken in Set. Similarly, for any set and functors 8, one for each 8 ∈ , their coproduct is computed pointwise

8 (() 8((). ! Õ8∈ Õ8∈ Deﬁnition 2.1. A polynomial functor ? is any coproduct

? ≔ y?[8] Õ8∈ of representable functors, where ∈ Set and each ?[8] ∈ Set are sets. We denote the category of polynomial functors and natural transformations between them by Poly.

= ?[8] 1 We note that if ? 8∈ y then ?( ) ; hence we can write any ? ∈ Poly in canonical form Í ? y?[8]. (3) 1 8∈Õ?( ) We refer to each 8 ∈ ?(1) as a position in ? and to each 3 ∈ ?[8] as a direction at 8. 0 Example 2.2. We can consider any set ( as a constant polynomial B∈( y . We can consider a polynomial ? ∈ Poly as a set (or discrete category)Í ?(1) equipped with a functor ?[−]: ?(1) → Set. Then a map of polynomials ! : ? → @ can be identiﬁed with a diagram as follows

!1 ?(1) @(1) ⇐ (4) !♯ ?[−] @[−] Set

4 That is, ! can be decomposed into a function !1 : ?(1) → @(1) on positions, and for 1 ≔ ♯ every 8 ∈ ?( ) with 9 !1(8), a component function !8 : @[9] → ?[8] on directions. This follows from the Yoneda lemma and the universal property of coproducts. We ♯ will sometimes use this (!1, ! ): ? → @ notation below.

2 2 Example 2.3. A morphism 1y → 1y can be identiﬁed with a function !1 : 1 → ♯ 1 and a function ! : 1 × 2 → 2. That is,

( 2 2 ) 1 12 Poly 1y , 1y 1 2 . (5) Proposition 2.4. The composite of polynomial functors ?, @ ∈ Poly, which we denote ? ⊳ @, is again polynomial with formula

? ⊳ @ y 3∈?[8] @[93] (6) 1 1 Í 8∈Õ?( ) 9 : ?[Õ8]→@( ) The composition operation ⊳ is a (nonsymmetric) monoidal structure on Poly, with unit y.

Proof. See page 12.

Example 2.5. If ? = y2 and @ = y + 1, then ? ⊳ @ y2 + 2y + 1 whereas @ ⊳ ? y2 + 1. Example 2.6. Applying a polynomial ? to a set ( is given by composition: ?(() ? ⊳ (.

Proposition 2.7 (The symmetric monoidal category (Poly, y, ⊗)). The category Poly has a symmetric monoidal structure with unit y and monoidal product ⊗ on objects given by the following formula ? ⊗ @ ≔ y?[8]×@[9] 1 1 8∈Õ?( ) 9∈Õ@( ) Proof. See page 13.

Proposition 2.8 (Internal hom [−, −]). The ⊗ monoidal structure on Poly is closed; that is, for every ?, @ ∈ Poly there is a polynomial

[?, @] ≔ y 8∈?(1) @[!1(8)] (7) → Í ! :Õ? @ for which we have a natural isomorphism

Poly(A ⊗ ?, @) Poly(A, [?, @]). (8)

Proof. See page 13.

Example 2.9. Given sets and , we use Eqs. (5) and (7) to compute that the internal hom between y and y is

[y,y] y.

The counit of the adjunction (8) is a natural map eval: ? ⊗[?, @] → @ called eval- uation. In very much the same way, it induces two sorts of morphisms we will use later:

[?1, @1]⊗[?2, @2]→[?1 ⊗ ?2, @1 ⊗ @2] and [?, @]⊗[@, A]→[?, A]. (9)

5 2.2 Coalgebras, generalized Moore machines, and learners

Coalgebras for endofunctors : Set → Set form a major topic of study [AM82; Adá05; Jac17]. In this section we recall the deﬁnition and explain the relevance to dynamical systems (generalized Moore machines) and learners.

Deﬁnition 2.10 (Coalgebra). Given a polynomial ?, a ?-coalgebra is a pair ((, ) where ( ∈ Set and : ( → ? ⊳ (. A ?-coalgebra morphism from ((, ) to ((′, ′) consists of a function 5 : ( → (′ such that the following diagram commutes:

( ? ⊳ (

5 ?⊳ 5 (10) (′ ? ⊳ (′ ′

We denote the category of ?-coalgebras and their morphisms by ?-Coalg.

Proposition 2.11. A ?-coalgebra ((, ) can be identiﬁed with a map of polynomials

(y( → ?

Proof. One ﬁnds an isomorphism Poly((, ?⊳() Poly((y( , ?) by direct calculation.

Warning 2.12. Looking at Proposition 2.11, one might be tempted to think that a map of ?-coalgebras as in (10) can be identiﬁed with a commuting triangle

′ (y( ? (′y( ? but this is not the case; for one thing, the marked arrow does not arise from a function 5 : ( → (′.

Proposition 2.13. For any ?, @ ∈ Poly there is a functor

?-Coalg × @-Coalg → (? ⊗ @)-Coalg making •-Coalg a lax monoidal functor Poly → Cat.

Proof. See page 13.

The relevance of coalgebras to dynamics was of interest in the earliest of references we know of, namely [AM82], where they are referred to as codynamics. We will proceed with our own terminology.

Deﬁnition 2.14 (Moore machine). For sets ,, an (,)-Moore machine consists of • a set (, elements of which are called states, • a function A : ( → , called readout, and • a function D : ( × → (, called update. It is further called initialized if it is equipped with an element B0 ∈ (.

6 With an initialized (,)-Moore machine ((,A,D,B0), we can take any -stream 0 : N → and produce a -stream 1 : N → inductively using the formula

B=+1 ≔ D(B= , 0=) and 1= ≔ A(B=).

Proposition 2.15. An (,)-Moore machine with states ( can be identiﬁed with a map of polynomials (y( → y, and hence with a y-coalgebra ( → y ⊳ ( by Proposition 2.11.

♯ Proof. The identiﬁcation uses !1 ≔ A and ! ≔ D.

We can thus refer to any ?-coalgebra as a generalized Moore machine, where ? ∈ Poly is generalizing the monomial y. Mathematically, given : ( → ? ⊳ (, we also get the two-fold composite

?⊳ ( −→ ? ⊳ ( −−→ ? ⊳ ? ⊳ ( and indeed the =-fold composite ( → ?⊳= ⊳ ( for any = ∈ N. The idea is that for every state B ∈ (, we get a position A(B) ∈ ?(1), and for every direction 3 ∈ ?[A(B)] there, we get a new state D(B, 3). We thus think of ? as an interface for the dynamical system: ?(1) says what the world can see about the current state—i.e. its outward position 8 ≔ A(B)—and ?[8] says what sort of forces or inputs the state can be subjected to. A map of polynomials ! : ? → ?′ is a change of interface. We can transform a ?- dynamical system into a ?′-dynamical system that has the same set of states. Indeed, simply compose any ( → ? ⊳ ( with ! ⊳ ( : ? ⊳ ( → ?′ ⊳ (. ′ More generally, a map ! : ?1 ⊗···⊗ ?: → ? allows us to take :-many dynamical systems (1 → ?1 ⊳ (1 through (: → ?: ⊳ (: and use Proposition 2.13 to combine them into a single dynamical system

! ′ ( → (?1 ⊗···⊗ ?:) ⊳ ( −→ ? ⊳ (

′ with interface ? and states ( ≔ (1 ×···× (:. Example 2.16. Wiring diagrams are one way of combining dynamical systems as above.

Controlled_Plant

= Plant ! (11) Controller

In the wiring diagram (11) three boxes are shown: the controller, the plant, and the system; we can consider each as having a monomial interface:

Plant = y Controller = y Controlled_Plant = y. (12)

The wiring diagram itself represents a morphism

! : y ⊗ y → y

7 ♯ in Poly. Defining ! requires a function !1 : × → and a function ! : × × → ××;thefirstisprojectionandthesecondisanisomorphism. Together these simply say how the wiring diagram shuttles information within the controlled plant. Indeed, the wiring diagram lets us put together dynamics of the controller and the plant to give dynamics for the controlled plant. That is, given Moore machines (y( → Plant and )y) → Controller, we get a Moore machine ()y() → Controlled_Plant. More generally, we can think of transistors in a computer as dynamical systems, and the logic gates, adder circuits, memory circuits, a connected keyboard or monitor, etc. each as a wiring diagram comprising these simpler systems. Example 2.17. In Example 2.16 the wiring pattern is fixed, but as we show in [Spi20], Poly also supports wiring diagrams for dynamical systems that can change their interaction pattern based on their internal states. We now come to learners. As mentioned in the introduction, the category Learn from [FST19] is better understood as a bicategory. Its objects are sets, Ob(Learn) = Ob(Set), a 1-morphism (learner) from to consists of a set % and maps : × % → and (',*): × × % → × %, and a 2-morphism—a morphism between learners—is a map 5 : % → %′ making the following squares commute:

(',*) × % × × % × % × 5 ×× 5 × 5 (13) × %′ × × %′ × %′ ′ ('′,*′) We denote the category of learners from to as Learn(,). Proposition 2.18. For sets ,, there is an equivalence of categories Learn(,) [y,y]-Coalg. Proof. See page 14.

We will now give a definition that generalizes the bicategory Learn, give examples, and discuss intuition. In particular, we define a category-enriched operad Org, which includes Learn as a full subcategory. Definition 2.19 (The operad Org). We define Org to be the category-enriched operad defined as follows. The objects of Org are polynomials: Ob(Org) ≔ Ob(Poly). For ′ objects ?1,...,?: , ? , the category of maps between them is defined by ′ ′ Org(?1,...,?:; ? ) ≔ [?1 ⊗···⊗ ?: , ? ]-Coalg. For any object ?, the identity on ? is given by the [?, ?]-coalgebra 1 → [?, ?](1) Poly(?, ?) that sends 1 7→ id?.

Given objects ?1,1,...,?1,91 ,...,?:,1,...,?:,9: , the composition functor

[?1,1 ⊗···⊗ ?1,91 , ?1]-Coalg ×···×[?:,1 ⊗···⊗ ?:,9: , ?:]-Coalg ′ ′ ×[?1 ⊗···⊗ ?: , ? ]-Coalg →[?1,1 ⊗···⊗ ?:,9: , ? ]-Coalg is given by repeated application of the maps in (9) and Proposition 2.13.

8 ′ How do we think of a morphism ((, ): (?1,...,?:)→ ? in Org? It is a dynamical system which has a set ( of states. For every state B ∈ (, we can read out an associated ′ element 1(B): ?1 ⊗···⊗ ?: → ? ; we can think of this as a wiring diagram as in Exam- ple 2.16 or a generalization thereof. That is, the current state B dictates an organization pattern 1(B): how outputs of the internal systems are aggregated and output from the outer interface, and how feedback from outside is distributed internally. But so far, this is only the readout of . What’s an input? An input to this system consists of a tuple of outputs 8 ≔ (81,...,8:) ∈ ?1(1)×···× ?:(1), one output for each ′ of the internal systems, together with an input 3 ∈ ? [1(8)] to the outer system. Imagine you’re the officer in charge of an organization: you’re in charge of how your employees and other resources are arranged, how they send information to each other and the outside world, and how the feedback from the outside world is disbursed to the employees and resources. You see what they do, you see how the world responds, and you update your internal state and the organization’s arrangement however you see fit. In this image, you as the officer are playing the role of ((, ), i.e. a morphism in Org. But even a simple logic gate or adder circuit in a computer—something that doesn’t have a changing internal state or update how resources are connected—counts as a morphism in Org. Again, the only difference in that case is that the state set ( 1, the way the internal resources are connected—is unchanged by inputs. Example 2.20. For any operad, there is an algebra of 0-ary morphisms. In the case of Org, this algebra Org → Cat sends ? 7→ ?-Coalg, the category of dynamical systems on ?, since the unit of ⊗ is y and [y, ?] ?. Next we’ll give a mathematical language for describing dynamical systems as in Example 2.20 as well as the generalized learners (or officers) described above.

3 Toposes of learners

We ended the previous section by deﬁning the (category-enriched) operad Org and explaining how it generalizes the bicategory Learn. In this section we mainly discuss the internal language for each learner. That is, given ?, ?′ ∈ Ob(Poly) = Ob(Org), ′ where perhaps ? = ?1 ⊗···⊗ ?: , we discuss the category Org(?; ? ) of such learners. Our ﬁrst job is to show that every such category is a topos; this will give us access to the Mitchell-Benabou language and Kripke-Joyal semantics—the so-called internal language of the topos and its interpretation. We then explain the sorts of things— propositions—that one can express in this language, e.g. the proposition “I will follow the gradient descent algorithm” is a particular case.

3.1 The topos of ?-coalgebras

In this section, we show that for any polynomial ?, there is a category C?, called the cofree category on ?, for which we can ﬁnd an equivalence

?-Coalg C?-Set

9 between ?-coalgebras and functors C? → Set. In fact, the category C? is free on a graph, making it quite easy to understand in certain respects.3 Following [nLa21], we deﬁne a rooted tree to be a graph ) whose free category has an initial object, called the root; the idea is that for any node =, there is exactly one path from the root to =. We denote the nodes of ) by )0, the root by root) ∈ )0, and for any node = ∈ )0 we denote the set of arrows emanating from = by )[=]. Note that at the target =′ of any arrow 0 ∈ )[=], there sits another rooted tree with root =′; we denote this tree by cod)(0).

Deﬁnition 3.1 (The graph Tree? of ?-trees). For a polynomial ? ∈ Poly, deﬁne a ?-tree ♯ to be a tuple (), )1, ) ), where ) is a rooted tree, )1 : )0 → ?(1) is a function called the ♯ position function, and )= is a bĳection ♯ )= : ?[)1(=)] −→ )[=] for each node = ∈ )0, identifying the set of branches in the tree ) at node = with the set of directions in the polynomial ? at the position )1(=). We denote by Tree? the graph whose vertex set is the set of ?-trees, and for which ′ ′ an arrow 0 : ) → ) is a branch 0 ∈ )[root)] with ) = cod)(0).

Example 3.2. If ? = y for a nonempty set then there is only one ?-tree: each node has the unique label ?(1) 1 and -many branches. If ? = {go}y1 +{stop}y0 y+1 then counting the number of nodes gives a bĳection between set of ?-trees and the set N ∪ {∞}

stop go stop go go stop go go go • , • → • , • → • → • , ... , • → • → • →···

Theorem 3.3. For any polynomial ? there is an equivalence of categories

?-Coalg Tree?-Set where Tree? is the free category on the graph Tree? of ?-trees.

Proof. See page 14.

3.2 The internal language of ?-Coalg

For any category C,the category C-Set of functors C → Set forms a topos. In particular, this means that mathematicians have already developed a language and logic that faithfully represents the structures of C-Set, and we can import it wholesale; see [FS19, Chapter 7] or [MM92, Chapter VI]. Now that we know from Theorem 3.3 that ?-Coalg is a topos for any ? ∈ Poly, we are interested in corresponding language for the topos Learn(,) = [y,y]-Coalg of learners, for any sets ,; hence the title

3The name “cofree category” comes from the fact that—up to isomorphism—comonoids in Poly are categories; see [ahman2016directed]. So we’re really taking the cofree comonoid on ?.

10 • • • • • • • • • • • • • • • • • • • • • • • • • •

Figure 1: Left: a dynamical system, i.e. coalgebra, for the polynomial ? ≔ {•, •}y2+• 2y2 + 1. Right: the ?-tree corresponding to the node •.

“learners’ languages.” However since most of the relevant abstractions work more generally for ?-Coalg, we’ll mainly work there. Not assuming the reader knows topos theory, we will proceed as though we are defining the few relevant concepts from scratch, when in actuality we are merely “reading them off” from the established literature. For example Definition 3.4 simply unpacks the topos-theoretic definition of a logical proposition as a subobject of the terminal object in the topos ?-Coalg.

Definition 3.4. A logical proposition (about ?-coalgebras) is defined to be a set % ⊆ Tree? of ?-trees satisfying the condition that if ) ∈ % is a tree in %, then for any direction 3 ∈ )[root)], the tree cod)(3) ∈ % is also a tree in %. Proposition 3.5 gives us an easy way to construct logical propositions about ?- coalgebras, and hence learners. Namely, it says if we put a condition on the ?-positions that can show up as labels, and if we put a condition on the codomain map (how directions in the tree lead to new positions), we get a logical proposition. Of course, these aren’t the only ones, but they form a nice special case. Recall from Definition 3.1 that a ?-tree is a rooted tree ) equipped with a position 1 ♯ function )1 : )0 → ?( ); we elide the bĳections (earlier denoted )= : )[=] ?[)1(=)]) in what follows. Proposition 3.5. Given ? ∈ Poly, suppose given subsets & ⊆ ?(1) and ' ⊆ &. Ö8∈& 3Ö∈?[8] Then the following set of trees is a logical proposition: ' ≔ Tree %& {) ∈ ? | ∀(8 : )0).)1(8) ∈ & ∧ ∀(3 : )[8]). cod)(3) ∈ ('83)}. Proof. The result is immediate from Definition 3.4.

Example 3.6 (Gradient descent). The gradient descent, backpropagation algorithm used by each “neuron” in a deep learning architecture can be phrased as a logical proposition about learners. The whole learning architecture is then put together as in [FST19], or as we’ve explained things above, using the operad Org from Deﬁnition 2.19.

11 So suppose a neuron is tasked with learning a function R< → R=, and it has a parameter space R:, i.e. we are given a smooth function 5 : R: × R< → R=. We will deﬁne a corresponding logical proposition using Proposition 3.5. Deﬁne ? ∈ Poly by

R< R= R< R= ? ≔ [R