Higher Order Automatic Differentiation with Dual Numbers
Total Page:16
File Type:pdf, Size:1020Kb
https://doi.org/10.3311/PPee.16341 Creative Commons Attribution b |1 Periodica Polytechnica Electrical Engineering and Computer Science, 65(1), pp. 1–10, 2021 Higher Order Automatic Differentiation with Dual Numbers László Szirmay-Kalos1* 1 Department of Control Engineering and Information Technology, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, H-1521 Budapest, P. O. B. 91, Hungary * Corresponding author, e-mail: [email protected] Received: 28 April 2020, Accepted: 27 September 2020, Published online: 26 October 2020 Abstract In engineering applications, we often need the derivatives of functions defined by a program. The approach chosen for derivative computation must be algebraic to allow computer implementation. A particular solution to obtain first derivatives is the application of dual numbers. This paper proposes simple and compact generalizations of this idea to obtain derivatives of arbitrary order for single or multi-variate functions and the automatic handling of 0/0 ambiguities in the calculations. We also provide the C++ code that takes advantage of operator overloading and recursion. The method is demonstrated by path animation, Gaussian curvature computation, and curve fairing. Keywords dual numbers, higher order automatic differentiation 1 Introduction ∆∆ Many engineering tasks require the computation of ft + −−ft 22 ft′()≈ , derivatives of a function specified by a program seg- ∆ ment. Although, the most essential approaches need only ft()+ ∆∆− 2 ft()+−ft() the first derivative, there are many problems requiring ft′′ ()≈ , ∆2 second or higher order derivation as well. For example, 3∆∆ dynamics simulation uses the Newton’s laws stating that ft + −+3 ft 2 2 the force is proportional to the second derivative of the ft′′′ ()≈ ∆3 path. The Frenet frame is based not only on the first but ∆∆ 3 also on the second derivatives. Curvature calculation also 3 ft − −−ft 2 2 needs the derivatives up to order two. Computational aes- + , ∆3 thetics, curve or surface fairing [1] use the assumption that a fair curve or surface uniformly distributes the curva- ..., ture, which means that the third derivative of the paramet- ric function should also be evaluated. Newton-Raphson where ∆ is a small offset. This is very simple, but methods attacking inverse problems use the Hessian i.e. suffers from severe problems. If ∆ is too large, then the second derivative of the target function, and particu- the approximation is rather poor. However, when it lar applications include reverse engineering [2, 3], regres- is too small, the result is numerically unstable and is sion methods [4], deep learning [5, 6] or medical imag- valid for very few digits (Fig. 1). ing [7], etc. Second derivatives of parametric surfaces • Symbolic differentiation [10] mimics the manual play an important role in non-photorealistic rendering differentiation process and generates the mathemat- as well to identify the principal curvature directions [8]. ical definition of the derived function. Symbolic dif- There are several options to implement derivative compu- ferentiation handles the function as a whole, thus, tation in a computer program: it cannot solve conditionals and special cases. • Numerical differentiation [9] approximates the • Automatic differentiation [5, 11, 12] determines derivatives of f at t by the value and the derivative of a function defined by a Cite this article as: Szirmay-Kalos, L. "Higher Order Automatic Differentiation with Dual Numbers", Periodica Polytechnica Electrical Engineering and Computer Science, 65(1), pp. 1–10, 2021. https://doi.org/10.3311/PPee.16341 Szirmay-Kalos 2|Period. Polytech. Elec. Eng. Comp. Sci., 65(1), pp. 1–10, 2021 80 f(x) automatic first derivative to the derivation rules. Concerning higher order differen- automatic second derivative 60 numeric first derivative tiation, Fike and Alonso [13] proposed hyper-dual num- numeric second derivative 40 bers that have the following form: 20 Zx=+yz12++w12, 0 2 2 2 0 -20 where the imaginary units satisfy 1 ==2 ()12 = . -40 Hyper-dual numbers have three imaginary parts to mimic second order derivatives [14]. Following this construction, -60 order N derivatives would need 2N − 1 imaginary units, mak- -80 0 1 2 3 4 5 6 ing this approach prohibitively expensive for higher orders. (a) In this paper, we provide a simple and compact approach 80 f(x) automatic first derivative for higher order derivation. When dealing with deriva- automatic second derivative 60 numeric first derivative tives up to order N, we store the function and the deriv- numeric second derivative 40 atives in an N + 1 dimensional vector, i.e. we use only 20 the minimally required N imaginary units. To demonstrate 0 the simplicity, we also present the C++ classes implement- -20 ing the different solutions in this paper. -40 The structure of this paper is as follows. In Section 2 we first review the theory of dual numbers and their -60 application in derivative computation. Section 3 presents -80 0 1 2 3 4 5 6 our first result that generalizes dual numbers for multiple (b) imaginary units and establishes the arithmetic rules for the 80 f(x) automatic first derivative computation of higher order derivatives. We also address automatic second derivative 60 numeric first derivative the case of multi-variate functions and provide a solution numeric second derivative 40 for the computation of arbitrary derivatives with the appli- 20 cation of recursive functions in Section 4. 0 Summarizing, the main contributions of this paper are: -20 • Generalization of dual numbers for multiple but min- -40 imal number of imaginary units to compute higher order derivatives. -60 • Generalization of dual numbers to handle higher -80 0 1 2 3 4 5 6 order cross differentiation of multi-variate functions. (c) • Simple solution for the automatic higher order deri- Fig. 1 Numerical differentiation of a function with different Δ values. vation with recursive functions. The function is X(t) = sin(t)(sin(t) + 3)4 / (tan(cos(t)) + 2) (a) when • Automatic handling of 0/0 type ambiguities. ∆ = 0.1 the numerical derivative is not precise; (b) when ∆ = 0.001 the derivative becomes noisy; (c) when ∆ = 0.0001 the noise becomes 2 Dual numbers intolerable. We used single precision floating point calculation. Dual numbers are similar to ordinary complex numbers. Both of them are par ticular Clifford algebras or hy per-num- program exactly up to the precision allowed by the bers of form z = x + yi where x and y are real numbers finite digit representation of the computer. As this and i is the imaginary unit. We expect the existence of approach takes the function and the value where the addition, multiplication and division with the properties of function and the derivative should be determined, ordinary complex operations, like commutativity, distrib- it is able to handle conditionals and to check and utivity, associativity. This means that when we compute solve special cases. the product of two such numbers, the result should have the same form, which necessitates the definition of i2 also Forward mode first order automatic differentiation is in the form of x + yi. The particular definition of i2 distin- accomplished by replacing the algebra of real numbers guishes different hyper-number types. For differentiation, by the algebra of dual numbers with arithmetic similar we take dual numbers defined with thei 2 = 0 fundamental Szirmay-Kalos Period. Polytech. Elec. Eng. Comp. Sci., 65(1), pp. 1–10, 2021| 3 property, stating that i is an imaginary number, which Our goal is to define the arithmetic rules for such num- does not belong to the real numbers, but its square can be bers, and find producti jik in particular, to make these rules replaced by zero. The arithmetic rules in this algebra are similar to those of higher order derivation. as follows. The addition or subtraction is Derivatives of the sum or difference of two functions are just the sum or difference of the derivatives, respec- ZZ12±=()xy11+ ii±+()xy22=±()xx12+±()yy12i. (1) tively, so the generalized dual number rules of addition The multiplication is and subtraction are equivalent to the rules of derivation: ZZ12=+()xy11ii()xy22+ N ()jj() 2 ()ff12± =±()ff12i j = ()ff12± (). (4) (2) ∑ =+xx12 ()yx12+ xy12ii+ yy12 j=0 =+xx12 ()yx12+ x112y i. Let us consider multiplication. The product of two gen- To establish the rule of division, both the numerator eralized dual numbers is and the denominator are multiplied with the conjugate of N N N N =×()j ()k = ()j ()k . the denominator to get rid of the imaginary part in the ()ff12()∑∑ff1 iij 2 k ∑∑ ff12iijk (5) j=0 k =0 j=0 k =0 denominator: Z xy+ i ()xy11+ ii()xy22− On the other hand, the dual number representation of 1 = 11= product f f is Z2 xy22+ i ()xy22+ ii()xy22− 1 2 2 (3) N ()n xx12+ ()y1xxx21− yy21ii− y2 x1 yx12− xy12 = =+ . ()ff12 = ()ff12 in . 2 22 2 i ∑ n=0 ()xy2 − 2 i x2 x2 Examining Eqs. (1), (2), and (3), we can realize that Substituting the formula of the nth derivative of a product, () n n the real part undergoes the same elementary operation n ()mn()−m , ()ff12 = ∑ ff12 (6) as the dual number, while the imaginary part reflects m=0 m the arithmetic rules of derivation for addition, multiplica- we get N n n tion and division. This means that considering function f ()mn()−m . ()ff12 = ∑∑ ff12 in (7) and its derivative f' as the real and imaginary parts of a dual n=0 m=0 m number ()ff=+f ′i , an arbitrary sequence of the four The derivatives can be computed with generalized dual elementary operations results in the function value in the number algebra if the two rules lead to identical results: real part and the derivative in the imaginary part.