ÓÄÊ 519.61 Indenite integration as term rewriting: containing tangent ⃝c 2012 ã. J. Hu, Y. Hou, A.D. Rich and D.J. Jerey, Department of Applied Mathematics, The University of Western Ontario 1151 Richmond St. London, Ontario, Canada N6A 5B7 E-mail: [email protected], [email protected], [email protected], djemail protected] Ïîñòóïèëà â ðåäàêöèþ

We describe the development of a term-rewriting system for indenite integration; it is also called a rule-based evaluation system. The development is separated into modules, and we describe the module for a wide class of integrands containing the tangent function.

1. Introduction correctness is not the only aim of the development. The quality of expressions is judged by a number of Programming styles in computer-algebra systems are criteria, which are used to decide whether an integration frequently described as either term-rewriting based, or rule should be accepted. Assume that an integrand f(x) computationally based. For example, Mathematica is has a proposed primitive F (x). Selection is based on widely recognized as a rewrite language [1], whereas the following criteria, which are discussed further in the Maple is rarely described this way. The distinction next section. is mostly one of emphasis, since all available systems include elements of both styles of programming. The • Correctness: we require F ′(x) = f(x). dichotomy can be seen more specically in programming • Simplicity: we seek the simplest form for an to evaluate indenite integrals, also called primitives integral. We adopt a pragmatic approach and aim or anti-. For , some of for the shortest expression length. the best-known approaches are computationally based. • Continuity: we aim to ensure that all of the Thus the Risch [8] and the Rothstein- expressions for integrals are continuous on domains Trager-Lazard-Rioboo algorithm [9, 10, 5] are both of maximum extent [4]. computational . These algorithms and others • Æsthetics: we employ a number of principles to like them are not universally applicable, however, and select for mathematical beauty where possible. for many integrals rule-based rewriting is needed and • Utility: the rules should facilitate the afore- has advantages, some of which we discuss below. mentioned `show-step' application. See Sec. 2.5. Doubts have been expressed about the viability of • Eciency: the path to a result should be as direct large-scale term rewriting [2]; the present scheme is as possible, and the set of rules should be compact. more algorithmic and deterministic than earlier term rewriting schemes, and we prefer the description rule- A more detailed description of the repository is given based scheme for the integration scheme presented below. here. A particular success of rule-based schemes is the popularity of software that can display the steps of a 2. Discussion of selection criteria calculation. This is variously called `display step' or `single step'. Examples of software oering this include The following example allows us to discuss several WolframAlpha and Derive, and many aspects of our overall aim. We compare 5 possible tutorial programs. expressions for an integral. The rule-based integration scheme that is considered ∫ √ ( √ ) here [7, 6] consists of a public-domain repository of − − 1 − − − 2 tan x dx = 2 ln 2 2 tan x 2 2 tan x transformation rules for indenite integration, together (√ ) with utility les that allow it to be utilized by various − arctan −2 tan x − 1 computer-algebra systems. The repository is not a table ( √ ) + 1 ln 2 − 2 tan x + 2 −2 tan x of integrals; it is a compact set of rules, much smaller 2 (√ ) than a table covering the same domain. Also simple − arctan −2 tan x + 1 , (1)

1 ∫ √ therefore algebraic manipulation alone cannot succeed −2 tan x dx = (2) in nding it. Secondly, branch cuts could make the given √ ( [ ] −Tan[x] √ √ expression dier algebraically from the shortest form; √ −2ArcTan 1 − 2 Tan[x] another way to say this is that the two expressions dier 2 Tan[x] [ √ √ ] by a piecewise constant. Therefore an integration system +2ArcTan 1 + 2 Tan[x] should aim to obtain the simplest form directly, since [ √ √ ] algebraic simplication is not likely to be successful. + Log 1 − 2 Tan[x] + Tan[x] This can be seen particularly in the results of Risch [ √ √ ]) integration, which are obtained without regard for −Log 1 + 2 Tan[x] + Tan[x] , (3) branch cuts. √ = ln cos x + ln(1 + −2 tan x − tan x) 2.3. Continuity 1 + tan x + arctan √ , (4) √ − The integrand − is singular at , √ 2 tan x √ 2 tan x x = nπ + π/2 −2 tan x −2 tan x and is otherwise continuous. At x = nπ the function = − arctan + arctanh , (5) 1 + tan x 1 − tan x changes from real to imaginary, but is continuous 1 + tan x 1 − tan x at those points. Therefore expressions for its integral = arctan √ + arctanh √ . (6) should also be continuous except possibly at −2 tan x −2 tan x x = (n + 1/2)π. We see that expression (5) has discontinuities In the above equations, expression (1) comes from at tan x = ±1. Therefore the other expressions are Maple 16; (3) comes from Mathematica 8; (4) to (6) preferred. Combining continuity with simplicity, we can come from the present project. We now consider these therefore reduce our choice of preferred expressions to expressions under the headings listed above. (6) and (4). An additional point to note is that the integrand is integrable at its singular points, and one 2.1. Correctness could ask for the integral expression to be continuous there. None of the expressions has a dened value at With and denoting an integrand and primitive f F and therefore the evaluation of denite as above, we note that the test ′ requires x = nπ + π/2 F = f integrals with these endpoints must be evaluated as a a denition of dierentiation. Here, we choose it to one-sided limit. be complex dierentiation. This is reected in the It should be further noted that requirements for absence of absolute values around the arguments of the continuity and simplicity might conict. Consider a functions in the example. The debate between dierent example: the integral computer-algebra users over ∫ ∫ { x2 + 2 x dx ln x , Complex dx = arctan , (8) = (7) x4 − 3x2 + 4 2 − x2 x ln |x| , Real ( √ ) ( √ ) = arctan 2x + 7 + arctan 2x − 7 . (9) is of long standing. In the current repository, all transformation rules are valid for complex quantities. The discontinuous expression is shorter, and would be In the example the integrand itself becomes imaginary judged simpler by most users. This is a simple example for intervals (nπ, nπ + π/2) with n ∈ Z. using the Lazard-Rioboo algorithm [5]; as the degrees of Note that the verication of the test F ′ = f by the polynomials grow, the dierence in length between a computer-algebra system is not itself a trivial step. the two expressions increases. Maple cannot complete a verication of (4) in any straightforward way, for example. 2.4. Æsthetics

2.2. Simplicity Beauty is the rst test. There is no permanent place in the world for ugly mathematics. The last 3 expressions are clearly shorter and simpler  G. H. Hardy [3] than the rst two. A constant problem in computer- algebra systems is expression swell, and every part of If we compare (4) and (6), we see that (6) uses two such systems should be striving to keep results succinct. related functions, namely arctan and arctanh. Although This increases the utility of a system. we know that arctanh can be expressed in , Integral expressions cannot necessarily be simplied the look of (6) with its many symmetries has greater automatically into their shortest forms. There are mathematical beauty than the other expressions. two reasons for this. First, the shortest form may A similar principle concerns the relation between the dier by a constant from the given expression, and form of the integrand and the form of its integral.

2 Consider the following example, which comes from a transformation from left to right. Therefore the Maple 16. condition m + n + 1 ≠ 0 must be listed in the validity ∫ √ eld of the rule. In addition, the transformation reduces √ tan x cos x arccos (cos x − sin x) 2 tan x dx = √ the exponent of T1 = tan(c + dx) from m to m − 1, cos x sin x and clearly this requires ≥ in order to qualify ( √ √ ) m 1 − ln cos x + 2 tan x cos x + sin x . (10) for a simplication. Thus this becomes a condition in the simplication eld. We can note in passing that the In contrast to (1), also from Maple, this form introduces case m + n + 1 = 0 does not invalidate the equation, a jumble of functions not seen in the integrand. Also the which now reduces to a simple exact integral, but simply prevents the equation being used as a transformation. obvious symmetry of x → −x which relates problems (1) and (10) is not preserved in the answers. We prefer, then, A similar eect can be seen in transformation (18). to echo as much as possible the functional forms seen in Here, the case C = 0 reduces the transformation to a the integrand. trivial identity.

2.5. Utility 4. Functions containing tangent

The way in which rules are encoded will be reected The new module for integration addresses functions of when a user single-steps through a derivation. Some the form systems are happy to use many substitutions on the way to the nal expression. Although this reects the way tanm(c + dx)(a + b tan(c + dx))n a human might solve the problem, it makes for dicult ∗ (A + B tan(c + dx) + C tan2(c + dx)) , reading, since the user is probably not taking notes along ∫the way. A simple example is an integral of the form where a, b, c, d, A, B, C ∈ C and m, n ∈ R are arbitrary. f(ax + b) dx. Although a human might immediately This expression will not always be integrable, and the write y = ax + b and then work in y, we prefer to live aim is to evaluate all cases in which it is integrable. with the longer form, and save the user from having to This can even include cases in which m and n remain keep track of the substitutions. symbolic. It is worth commenting on this choice for the 3. Format for rules integrand, particularly the presence of the last factor. The appendix contains a set of recurrence relations Each entry in the repository has three functional parts, which allow us to simplify the integrand to a point where which together dene a rule. The repository entries may it can be evaluated explicitly. It is found that the nal contain other information, such as rule derivations and factor is necessary to the recurrence relations. Even if we literature citations, which do not have a functional role. set B = C = 0 in the above expression, one step of the 1. The transformation. This maps an integral to reduction of the integral will introduce the additional an expression which contains terms that are free terms. Remarkably, the case a2 +b2 = 0 allows a simpler of integrals and terms containing new (simpler) form of integrand to be reduced, and the appropriate integrals. relations are also included in the appendix. One standard approach to such integrals, used by 2. Validity conditions. Since the integrals in part 1 other systems, is the substitution u = tan(c + dx). This usually contain parameters, these conditions ensure removes all trigonometric functions from the integral, the correctness of the transformation. and converts the problem to a quasi-rational function 3. Simplication conditions. These conditions ensure in u. Another similar substitution is the Weierstrass that the transformation is desirable, meaning that substitution u = tan((c + dx)/2). We do not follow this any new integral or integrals will lead, after option. From a mathematical point of view, it brings further transformations, to a solution of the original with it problems of inverting the transformation, since problem. arctan is a branched function. From a programming point of view, the tactic reduces the independence of the An important design objective is that the conditions module. The integral problem is moved to other systems, dening the rules are mutually exclusive, meaning that or other modules in the same system, that handle once parameters are specied for any integrand, only non-trigonometric functions. Also, we stated earlier the one set of conditions will evaluate to true, and therefore aesthetic principle that we try to stay with the functions only one rule can be applied to any particular case. appearing in the integrand; this applies both to the As an example, consider transformation (21) in the derivations and the nal expressions. appendix. The factor (m + n + 1) on the left side The rule-based approach for the above integrands uses must clearly be non-zero for the equation to act as 12 transformations listed in the appendix to iteratively

3 reduce integrands to forms for which the evaluation is provided A2 + B2 ≠ 0, a2 + b2 ≠ 0 known. The nal rule is often called a termination rule. ∫ We have not listed all the rules here, because with the A + B tan(c + dx) √ dx −→ application conditions and termination rules, there are a + b tan(c + dx) ∫ too many to print. We now give an example of the rules 1 1 + i tan(c + dx) in action. (A − Bi) √ dx+ 2 a + b tan(c + dx) ∫ 1 1 − i tan(c + dx) 5. Example reduction (A + Bi) √ dx (15) 2 a + b tan(c + dx)

We consider a specic problem. In order to focus on the followed by if A2 + B2 = 0 and bA + aB ≠ 0 then principal idea, we have chosen the numerical constants ∫ so that some terms in the recurrence will have zero A + B tan(c + dx) √ dx −→ coecients and keep the expressions short. a + b tan(c + dx) [√ ] ∫ a+b tan(c+dx) 2B arctanh √ ∗ a+ bA tan(1 + i + x) − √ B (16) ( ) bA 4 − 12 tan(1 + i + x) + 9 tan(1 + i + x)2 d a + B dx . − 3/2 (2 3 tan(1 + i + x)) This rule is used twice to obtain the nal expression (11) √ + 2 2 − 3 tan(1 + i + x) To select one of the recurrences below, the program [√ ] compare the integrand with the standard form above √ 2 − 3 tan(1 + i + x) − 2 − 3i arctanh √ and identies m = 1 and n = −3/2. Since n ≤ −1, we 2 − 3i select recurrence (20). Because in this case 2 − [√ ] Ab abB + √ 2 2 − 3 tan(1 + i + x) a C = 0, the recurrence simplies even further. Thus − 2 + 3i arctanh √ we obtain 2 + 3i ∫ 1 tan(1 + i + x)(−26 + 39 tan(1 + i + x)) − √ dx . 13 2 − 3 tan(1 + i + x) 6. Concluding remarks (12) Although one aim of this project is to develop a Our choice of constants now allows us to simplify the system-independent repository of integration rules, at integrand algebraically. In practice this step is not present the host inuences the performed by appealing to the simplication routines form of some rules. Since the rst step in identifying of the host system. Rather, it is coded within the which rule to apply is to identify the pattern of the rule-based system as a separate rule. This is necessary integrand, the underlying pattern recognition functions because the general host simplication function will of the host system will inuence the selection of rules. ignore our aesthetic principle of working with the For example, in Mathematica, (sin(c + dx))−1 is original functions, in this case tangent. We obtain represented internally as csc(c + dx), and tan(c + dx)−1 as . Hence, the current module must contain ∫ cot(c + dx) √ entries for both tangent and cotangent (other modules − (13) tan(1 + i + x) 2 3 tan(1 + i + x) dx . must contain entries for sine and cosecant). A system which stored reciprocals of tangents dierently might This is next transformed using need a modication of the database. ∫ ∫ A second way in which the host system inuences the repository is through simplication. We have already n −→ n − n−1 − dn T1T2 dx T2 dn T2 T3(b, a, 0) dx noted that some algebraic simplications are coded as rule-based transformations, because otherwise the to obtain system simplier would destroy the patterns we prefer. ∫ Our inability to specify our requirements to system 3 + 2 tan(1 + i + x) √ √ dx + 2 2 − 3 tan(1 + i + x) . simpliers forces us to include algebraic simplication 2 − 3 tan(1 + i + x) within the repository, which is a signicant increase in (14) size.

Next an algebraic manipulation rule is used, valid

4 7. Appendix ∫ m n (21) We list the recurrence relations used in the integration bd(m + n + 1) T1 T2 T3(A, B, C) dx = scheme [11]. To save space and to show the structure ∫ more clearly, we use the abbreviations m n+1 − m−1 n ˆ CT1 T2 d T1 T2 T3 dx , ˆ ˆ − T1 = tan(c + dx) ,T2 = a + b tan(c + dx) , A = aCm , B = b(C A)(m + n + 1) , 2 ˆ T3(A, B, C) = A + B tan(c + dx) + C tan (c + dx) . C = aCm − bB(m + n + 1) .

Then the recurrences, valid for all A, B, C ∈ C, are ∫ ∫ m n (17) ad(m + 1) T mT nT (A, B, C) dx = (22) d(m + 1) T1 T2 T3(A, B, C) dx = 1 2 3 ∫ ∫ m+1 n m+1 n−1 ˆ ˆ ˆ AT m+1T n+1 + d T m+1T nTˆ dx , AT1 T2 + d T1 T2 T3(A, B, C) dx , 1 2 1 2 3

Aˆ = aB(m + 1) − Abn , Aˆ = aB(m + 1) − Ab(m + n + 2) , Bˆ = (bB − aA + aC)(m + 1) , Bˆ = −a(A − C)(m + 1) , Cˆ = −Ab(m + n + 2) . Cˆ = bC(m + 1) − Ab(m + n + 1) . The following 6 transformations, from (23) to (28), require the condition a2 + b2 = 0. Note that the third ∫ argument of T3 is now always 0. ∫ d(m + n + 1) T mT nT (A, B, C) dx = (18) 1 2 3 m n (23) ∫ d(m + 1) T1 T2 T3(A, B, 0) dx = ∫ CT m+1T n + d T mT n−1T (A,ˆ B,ˆ Cˆ) dx , 1 2 1 2 3 m+1 n−1 − m+1 n−1 ˆ ˆ AaT1 T2 d T1 T2 T3(A, B, 0) dx , Aˆ = Aa(m + n + 1) − C(m + 1)a , Aˆ = Ab(n − 1) − (Ab + Ba)(m + 1) , Bˆ = (aB + bA − bC)(m + n + 1) , Bˆ = Aa(m + n) − Bb(m + 1) . Cˆ = aCn + bB(m + n + 1) . ∫ m n (24) d(m + n) T1 T2 T3(A, B, 0) dx = ∫ ∫ ( ) m+1 n−1 m n−1 ˆ ˆ 2 2 m n (19) BbT T + d T T T3(A, B, 0) dx , bd(n + 1) a + b T1 T2 T3 dx = 1 2 1 2 ∫ ( ) Aˆ = Aa(n + m) − Bb(m + 1) , Ab2 − abB + a2C T mT n+1 + d T m−1T n+1Tˆ dx , 1 2 1 2 3 ˆ ( ) B = Ba(n − 1) + (Ab + Ba)(m + n) . Aˆ = − Ab2 − abB + a2C m , ∫ ˆ − B = b(bB + aA aC)(n + 1) , 2 m n (25) 2a nd T1 T2 T3(A, B, 0) dx = Cˆ = (m + n + 1)(aB − Ab)b − ma2C + (n + 1)b2C. ∫ m n m−1 n+1 ˆ ˆ BbT1 T2 + d T1 T2 T3(A, B, 0) dx , ˆ − ˆ − ∫ A = (Ab Ba)m , B = Bb(m n) + Aa(m + n) . ( ) 2 2 m n (20) ad(n + 1) a + b T1 T2 T3 dx = ∫ ∫ 2 m n ( ) 2a nd T T T3(A, B, 0) dx = (26) − 2 − 2 m+1 n+1 m n+1 ˆ 1 2 Ab abB + a C T1 T2 + d T1 T2 T3 dx ∫ ( ) − a(aA + bB)T m+1T n + d T mT n+1Tˆ (A,ˆ B,ˆ 0) dx , Aˆ = A a2(n + 1) + b2(m + n + 2) 1 2 1 2 3 − a(bB − aC)(m + 1) , Bˆ = a(aB − bA + bC)(n + 1), Aˆ = bB(m + 1) + aA(m + 2n + 1) , ( ) Cˆ = Ab2 − abB + a2C (m + n + 2) . Bˆ = (aB − Ab)(m + n + 1) .

5 ∫ m n (27) 10. Barry M. Trager. Algebraic factoring and rational ad(m + n) T1 T2 T3(A, B, 0) dx = ∫ function integration. In R. D. Jenks, editor, Proceedings of the 1976 ACM Symposium on m n m−1 n ˆ ˆ ˆ aBT1 T2 + d T1 T2 T3(A, B, 0) dx , symbolic and algebraic computation SYMSAC '76, pages 219226. ACM Press, 1976. Aˆ = −aBm , Bˆ = Aam + (Aa − Bb)n . 11. Detmar Martin Welz. Recurrence relations ∫ for integrals containing tangent. Personal m n (28) ad(m + 1) T1 T2 T3(A, B, 0) dx = Communication, 2011. ∫ m+1 n m+1 n ˆ ˆ ˆ aAT1 T2 + d T1 T2 T3(A, B, 0) dx , Aˆ = Abn − Ba(m + 1) , Bˆ = Aa(m + n + 1) .

ÑÏÈÑÎÊ ËÈÒÅÐÀÒÓÐÛ

1. B. Buchberger. Mathematica as a Rewrite Language. In T. Ida, A. Ohori, and M. Takeichi, editors, Functional and Logic Programming (Proceedings of the 2nd Fuji International Workshop on Functional and Logic Programming, November 1-4, 1996, Shonan Village Center), pages 113. Copyright: World Scientic, Singapore - New Jersey - London - Hong Kong, 1996.

2. Richard J. Fateman. A review of Mathematica. J. Symb. Computation, 13(5):545579, 1992.

3. G. H. Hardy. A mathematician's apology. Canto. Cambridge University Press, 2012.

4. D. J. Jerey. Integration to obtain expressions valid on domains of maximum extent. In Manuel Bronstein, editor, ISSAC '93: Proceedings of the 1993 International Symposium on Symbolic and Algebraic Computation, pages 3441. ACM Press, 1993.

5. Daniel Lazard and Renaud Rioboo. Integration of rational functions: Rational computation of the logarithmic part. J. Symb. Computation, 9:113115, 1990.

6. A. D. Rich. Rule based mathematics. Website: www.apmaths.uwo.ca/~arich

7. A. D. Rich and D. J. Jerey. A knowledge repository for indenite integration based on transformation rules. In Intelligent Computer Mathematics, volume 5625 of LNCS, pages 480485. Springer, 2009.

8. Robert H. Risch. The problem of integration in nite terms. Trans. Amer. Math. Soc., 139:167189, 1969.

9. Michael Rothstein. A new algorithm for the integration of exponential and logarithmic functions. In Proceedings of the 1977 users conference, pages 263274, 1977.

6