Analytical Differential Calculus with Integration

Analytical Differential Calculus with Integration Han Xu ! Department of Computer Science and Technology Peking University, China Zhenjiang Hu ! Key Laboratory of High Confidence Software Technologies (MoE) Department of Computer Science and Technology Peking University, China Abstract Differential lambda-calculus was first introduced by Thomas Ehrhard and Laurent Regnierin 2003. Despite more than 15 years of history, little work has been done on a differential calculus with integration. In this paper, we shall propose a differential calculus with integration from a programming point of view. We show its good correspondence with mathematics, which is manifested by how we construct these reduction rules and how we preserve important mathematical theorems in our calculus. Moreover, we highlight applications of the calculus in incremental computation, automatic differentiation, and computation approximation. 2012 ACM Subject Classification Software and its engineering → General programming languages; Software and its engineering → Functional languages Keywords and phrases Differential Calculus, Integration, Lambda Calculus, Incremental Computa- tion, Adaptive Computing Digital Object Identifier 10.4230/LIPIcs.ICALP.2021.23 1 Introduction Differential calculus has more than 15 years of history in computer science since thepioneer work by Thomas Ehrhard and Laurent Regnier [9]. It is, however, not well-studied from the perspective of programming languages; we would expect the profound connection of differential calculus with important fields such as incremental computation, automatic differentiation and self-adjusting computation just like how mathematical analysis connects with mathematics. We want to understand what is the semantics of the derivative of a program and how we can use these derivatives to write a program. That is, we wish to have a clear description of derivatives and introduce integration to compute from operational derivatives to the program. The two main lines of the related work are the differential lambda-calculus [9, 8] and the change theory [7, 4, 5]. On one hand, the differential lambda-calculus uses linear substitution to represent the derivative of a term. For example, given a term x ∗ x (i.e., x2), with the arXiv:2105.02632v2 [cs.PL] 7 May 2021 ∂x∗x differential lambda-calculus, we may use the term ∂x · 1 to denote its derivative at 1. As there are two alternatives to substitute 1 for x in the term x ∗ x, it gives (1 ∗ x) + (x ∗ 1) (i.e., 2x) as the derivative (where + denotes "choice"). Despite that the differential lambda-calculus provides a concise way to analyze the alternatives of linear substitution on a lambda term, there is a gap between analysis on terms and computation on terms. For instance, let +′ denote our usual addition operator, and + ∂x+′x ′ ′ denote the choice of linear substitution. Then we have that ∂x · 1 = (1 + x) + (x + 1), which is far away from the expected 1 +′ 1. Moreover, it offers no method to integrate overa ∂t derivative, say ∂x · y. On the other hand, the change theory gives a systematic way to define and propagate © Han Xu and Zhenjiang Hu; licensed under Creative Commons License CC-BY 4.0 448th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 23:2 Analytical Differential Calculus with Integration (transfer) changes. The main idea is to define the change of function f as Derive f, satisfying f(x ⊕ ∆x) = f(x) ⊕ (Derive f) x ∆x. where ⊕ denotes an updating operation. It reads that the change over the input x by ∆x results in the change over the result of f(x) by (Derive f) x ∆x. While change theory provides a general way to describe changes, the changes it described are differences (deltas) instead of derivatives. It is worth noting that derivative is not the same as delta. For example, by change theory, we can deduce that f(x) will be of the form of x ∗ x + C if we know (Derive f) x ∆x = 2 ∗ x ∗ ∆x + ∆x ∗ ∆x, but we cannot deduce this form if we just know that its derivative is 2 ∗ x, because change theory has no concept of integration or limits. Although a bunch of work has been done on derivatives [9, 8, 7, 4, 20, 17, 22, 10, 1], there is unfortunately, as far as we are aware, little work on integration. It may be natural to ask what a derivative really means if we cannot integrate it. If there is only a mapping from a term to its derivative without its corresponding integration, how can we operate on derivatives with a clear understanding of what we actually have done? In this paper, we aim at a new differential framework, having dual mapping between derivatives and integrations. With this framework, we can manifest the power of this dual mapping by proving, among others, three important theorems, namely the Newton-Leibniz formula, the Chain Rule and the Taylor’s theorem. Our key idea can be illustrated by a simple example. Suppose we have a function f map- T ping from an n-dimensional space to an m-dimensional space. Then, let x be (x1, x2, ..., xn) , T and f(x) be (f1(x), f2(x), ..., fm(x)) . Mathematically, we can use a Jacobian matrix A to represent its derivative, which satisfies the equation ∂f1 ∂f1 ··· ∂f1 ∂x1 ∂x2 ∂xn ∂f2 ∂f2 ··· ∂f2 ∂x1 ∂x2 ∂xn f(x + ∆x) − f(x) = A∆x + o(∆x), where A = ············ ∂fm ∂fm ··· ∂fm ∂x1 ∂x2 ∂xn However, computer programs usually describe computation over data of some structure, rather than just scalar data or matrix. In this paper, we extend the idea and propose a new calculus that enables us to perform differentiation and integration on data structures. Our main contributions are summarized as follows. To our knowledge, we have made the first attempt of designing a calculus that provides both derivative and integral. It is an extension of the lambda-calculus with five new operators including derivatives and integrations. We give clear semantics and typing rules, and prove that it is sound and strongly normalizing. (Section 2) We prove three important theorems and highlight their practical application for incremental computation, automatic differentiation, and computation approximation. R t2 ∂t We prove the Newton-Leibniz formula: |xdx = t[t2/y] ⊖ t[t1/y], which is also t1 ∂y known as Second Fundamental Theorem of Calculus. It shows the duality between derivatives and integrations, and can be used for incremental computation. (Section 3) ∂f(g x) ∂f y ∂g z ′ We prove the Chain Rule: ∂x |t1 ∗t = ∂y |g t1 ∗( ∂z |t1 ∗t). It says ∀x, ∀x0, (f(g(x))) ∗ ′ ′ x0 = f (g(x)) ∗ g (x) ∗ x0, and can be used for incremental computation and automatic differentiation. (Section 4) ∞ P 1 (k) k We prove the Taylor’s Theorem: f t = k! (f t0) ∗ (t ⊖ t0) . Different from k=0 that one of the differential lambda-calculus [9], this Taylor’s theorem manifests res- H. Xu and Z. Hu 23:3 Terms t ::= c constants of interpretable type | x variable | λx : T. t lambda abstraction | t t function application | (t1, t2, . , tn) | πj t n-tuple and projection | t ⊕ t addition | t ⊖ t subtraction | t ∗ t multiplication ∂t | ∂x |t derivative R t | t t dx integration | inl t | inr t left/right injection | case t of inl x1 ⇒ t | inr x2 ⇒ t case analysis | fix t fix point Types T ::= B base type | (T1,T2,...,Tn) product type | T → T function type | T + T sum type Contexts Γ ::= ∅ empty context | Γ, x : T variable binding Figure 1 Calculus Syntax ults of computation instead of analysis on occurrence of terms. It can be used for approximation of a function computation. (Section 5) 2 Calculus In this section, we shall give a clear definition of our calculus with both derivatives and integration. We explain important insights in our design, and prove some useful properties and theorems that will be used later. 2.1 Syntax Our calculus, as defined in Figure 1, is an extension of the simply-typed lambda21 calculus[ ]. Besides the usual constant, variable, lambda abstraction, function application, and tuple, it ∂t introduces five new operations: addition ⊕, subtraction ⊖, multiplication ∗, derivative ∂x |t R t and integration t t dx. The three binary operations, namely ⊕, ⊖, and ∗, are generalizations of those from our mathematics. Intuitively, x ⊕ ∆ is for updating x with change ∆, ⊖ for canceling updates, and * for distributing updates. We build up terms from terms of base types (such as R, C), and on each base type we require these operations satisfy the following properties: The addition and multiplication are associative and commutative, i.e., (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c), a ⊕ b = b ⊕ a, (a ∗ b) ∗ c = a ∗ (b ∗ c), a ∗ b = b ∗ a. The addition and the subtraction are cancellable, i.e., (a ⊕ b) ⊖ b = a and (a ⊖ b) ⊕ b = a. I C A L P 2 0 2 1 23:4 Analytical Differential Calculus with Integration c : T ∈ Γ x : T ∈ Γ (TCon) (TVar) Γ ⊢ c : T Γ ⊢ x : T Γ ⊢ t : T1 Γ ⊢ t : T2 (TInl) (TInr) Γ ⊢ inl t : T1 + T2 Γ ⊢ inr t : T1 + T2 ∗ ∗ ∗ ∗ Γ ⊢ t1 : T Γ ⊢ t2 : T Γ ⊢ t1 : T Γ ⊢ t2 : T ( ) ( ) ∗ TAdd ∗ TSub Γ ⊢ t1 ⊕ t2 : T Γ ⊢ t1 ⊖ t2 : T Γ, x : T1 ⊢ t : T2 Γ ⊢ t1 : T1 → T2 Γ ⊢ t2 : T1 (TAbs) (TApp) Γ ⊢ λx : T1. t : T1 → T2 Γ ⊢ t1 t2 : T2 Γ ⊢ t : T → T Γ ⊢ t1 : T1 Γ, x : T1 ⊢ t2 : T2 (TFix) (TDer) ∂t2 ∂T2 Γ ⊢ fix t : T Γ ⊢ |t : ∂x 1 ∂T1 ∀j ∈ [1, n], Γ ⊢ tj : Tj ∀j ∈ [1, n], Γ ⊢ t :(T1,T2, ..., Tn) Γ ⊢ (t1, t2, ..., tn):(T1,T2, ..., Tn) Γ ⊢ πj t : Tj (TPair) (TProj) ∂T ∗ Γ ⊢ t1 : Γ ⊢ t2 : T ∂T ( ) ∗ TMul Γ ⊢ t1 ∗ t2 : T ∂T ∗ Γ ⊢ t1 : T Γ ⊢ t2 : T Γ, x : T ⊢ t : ∂T (TInt) Γ ⊢ R t2 t dx : T ∗ t1 Γ, x1 : T1 ⊢ t1 : T Γ, x2 : T2 ⊢ t2 : T Γ ⊢ t : T1 + T2 (TCase) Γ ⊢ case t of inl x1 ⇒ t1 | inr x2 ⇒ t2 : T Figure 2 Typing Rules The multiplication is distributive over addition, i.e., a ∗ (b ⊕ c) = a ∗ b ⊕ a ∗ c.

Load more