<<

DEGREE PROJECT IN AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2016

Constructing a Computer System Capable of Generating Pedagogical Step-by-Step Solutions

DMITRIJ LIOUBARTSEV

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Constructing a System Capable of Generating Pedagogical Step-by-Step Solutions

DMITRIJ LIOUBARTSEV

Examenrapport vid NADA Handledare: Mårten Björkman Examinator: Olof Bälter Abstract

For the problem of producing pedagogical step-by-step so- lutions to mathematical problems in education, standard methods and used in construction of computer algebra systems are often not suitable. A method of us- ing rules to manipulate mathematical expressions in small steps is suggested and implemented. The problem of creat- ing a step-by-step solution by choosing which rule to apply and when to do it is redefined as a graph search problem and variations of the A* are used to solve it. It is all put together into one prototype solver that was evalu- ated in a study. The study was a questionnaire distributed among high school students. The results showed that while the solutions were not as good as human-made ones, they were competent. Further improvements of the method are suggested that would probably lead to better solutions. Referat

Konstruktion av ett datoralgebrasystem kapabelt att generera pedagogiska steg-för-steg-lösningar

För problemet att producera pedagogiska steg-för-steg lös- ningar till matematiska problem inom utbildning, är vanli- ga metoder och algoritmer som används i konstruktion av datoralgebrasystem ofta inte lämpliga. En metod som an- vänder regler för att manipulera matematiska uttryck i små steg föreslås och implementeras. Problemet att välja vilka regler som ska appliceras och när de ska göra det för att skapa en steg-för-steg lösning omdefineras som ett grafsök- ningsproblem och varianter av algoritmen A* används för att lösa det. Allt sätts ihop till en prototyp av en lösare vilken utvärderas i en studie. Studien var ett frågeformulär som delades ut till gymnasiestudenter. Resultaten visade att även fast lösningar skapade av programmet inte var lika bra som lösningar skapade av människor, så var de an- ständiga. Fortsatta föbättringar av metoden föreslås, vilka troligtvis skulle leda till bättre lösningar. Contents

1 Introduction 1 1.1 Problem statement ...... 1 1.2 Problem definition, scope and evaluation ...... 2 1.3 Layout of this document ...... 3

2 Related Work 5 2.1 Applications of computer algebra ...... 5 2.2 Computer algebra systems with step-by-step capabilities ...... 6

3 Computer Algebra Systems 9 3.1 A brief overview ...... 9 3.2 Construction of computer algebra systems ...... 9 3.2.1 Representation of primitive types ...... 9 3.2.2 Representation of mathematical ...... 10 3.2.3 Algorithms ...... 12 3.2.4 Interface ...... 13

4 SymPy 15 4.1 Representation of mathematical formulas ...... 15 4.2 Creation of SymPy objects ...... 17 4.3 Algorithms ...... 18 4.4 Output ...... 19

5 Method 21 5.1 Considerations for a step-by-step CAS ...... 21 5.1.1 Pedagogical considerations ...... 21 5.1.2 Differences to normal CASs ...... 22 5.2 Overview of chosen method ...... 23 5.3 Modifications to SymPy ...... 25 5.3.1 Classes ...... 25 5.3.2 Termlists and Factorlists ...... 26 5.3.3 Object creation and ...... 27 5.3.4 Input ...... 28 5.3.5 Output ...... 29 5.4 Rules ...... 30 5.4.1 sortBoth ...... 31 5.4.2 simplifyTerms ...... 32 5.4.3 simplifyProduct ...... 33 5.4.4 groupPow ...... 33 5.4.5 eqRules ...... 33 5.5 Search ...... 34 5.5.1 Redefining the problem as a graph search ...... 34 5.5.2 Implicitly generating the graph ...... 35 5.5.3 Simplify ...... 36 5.5.4 ShowEquality ...... 38 5.5.5 Solve ...... 38 5.6 Interface ...... 39

6 Results 41 6.1 Evaluation ...... 41 6.2 Presentation of the results ...... 43 6.2.1 Result totals for questions 1 to 5 ...... 44 6.2.2 Result totals for questions 1 to 5 by different math courses . 48 6.2.3 Result totals for questions 1 to 5 by different high school years 50 6.2.4 Results for turing-type question 6 ...... 51

7 Discussion 53 7.1 Discussion of results ...... 53 7.2 Solution aesthetics and fairness ...... 54 7.3 Discussion of the questionnaire answers ...... 55 7.4 Further work ...... 55 7.5 Conclusion ...... 56

Bibliography 57

A Questionnaire 61

B Questionnaire Raw Results 69

C Search Algorithm Pseudocode 71

D Rules 73 D.1 Rules ...... 73 D.1.1 negAddToSub ...... 73 D.1.2 negNeg ...... 73 D.1.3 negDivNumer ...... 73 D.1.4 negDivDenom ...... 73 D.1.5 negTerms ...... 74 D.1.6 negPlusminusRewrite ...... 74 D.1.7 negPlusminus ...... 74 D.1.8 negMul ...... 74 D.1.9 negNegMul ...... 74 D.1.10 termsToCommonDenom ...... 74 D.1.11 termsOnSameDenom ...... 75 D.1.12 mulRemove0 ...... 75 D.1.13 mulRemove1 ...... 75 D.1.14 divRemove0 ...... 75 D.1.15 divRemove1 ...... 75 D.1.16 rewriteMulOnDiv ...... 75 D.1.17 divMulDiv ...... 76 D.1.18 divDivDiv ...... 76 D.1.19 divDivMul ...... 76 D.1.20 mulDivDiv ...... 76 D.1.21 shortenFactorsSamediv ...... 76 D.1.22 mulInto ...... 77 D.1.23 mulTermsFull ...... 77 D.1.24 distrQuadPosNeg ...... 77 D.1.25 distrQuadPosPos ...... 78 D.1.26 distrQuadNegNeg ...... 78 D.1.27 powExpand ...... 78 D.1.28 breakOut ...... 78 D.1.29 factorizePoly ...... 78 D.1.30 factQuadPosNeg ...... 79 D.1.31 factQuadPosPos ...... 79 D.1.32 factQuadNegNeg ...... 79 D.1.33 simplifyPow ...... 79 D.1.34 simplifyPowFrac ...... 80 D.1.35 simplifyPowFracNeg ...... 80 D.1.36 powFracRewrite ...... 80 D.1.37 powPow ...... 80 D.1.38 powRoot ...... 80 D.1.39 powRemove1 ...... 80 D.1.40 divPowToPowneg ...... 80 D.1.41 sqrt0 ...... 81 D.1.42 simplifyRoot ...... 81 D.1.43 rootNegImg ...... 81 D.1.44 simplifyRootFrac ...... 81 D.1.45 rootFracTopNeg ...... 81 D.1.46 rootFracBotNeg ...... 81 D.1.47 rewriteRootFrac ...... 81 D.1.48 rootPow ...... 82 D.1.49 plusminusSplitRename ...... 82 D.1.50 pqFormula ...... 82

Chapter 1

Introduction

1.1 Problem statement

Math education in schools (primary school and high school) teaches new concepts by first providing a theoretical explanation of the concept, then showing a few examples and finally the students practice the concept by solving problems related to the concept. However, for some students that may not be enough, as they may get stuck on the problem solving part. The text book only has the answer, and the student may wonder how that answer was reached. The answer might give a clue to how to solve it, but it is not very pedagogical. Furthermore, if a student did solve the problem, he or she can see that the answer is correct, but does not get any feedback on the solution unless the teacher manually checks it. The root of the problem is that manually producing pedagogical solutions to every math problem in a text book is time consuming, and would most likely re- quire the solutions to be printed in a separate book, making the whole package significantly more expensive. A way of solving this problem is through automation with a program that could solve mathematical problems and return the solutions. This would allow the creators of math text books, or independent companies, to efficiently create solutions to many problems quickly. Such a program could also be useful in the hands of students as that would allow on-demand solution generation and eliminate the problem of having to publish the solutions. However, a balance has to be established between study help and protection against cheating, as some students would inevitably use it to simply get done with their homework. The goal of this thesis is to design and construct a program that can solve this problem, i.e. automatically generate pedagogical solutions to math problems and test whether these can be of value in a pedagogical setting. A more precise problem definition will be given in the next section. The principal of this thesis project is Mathleaks, a company that has a service that provides students with solutions to problems from math text books. Their solutions are currently created by humans by hand, and they want to automate part of that process.

1 CHAPTER 1. INTRODUCTION

1.2 Problem definition, scope and evaluation

Programs can already solve many types of problems both numerically and sym- bolically. For example, finding roots to a can be solved numerically with the famous Newton-Raphson method, or symbolically by manipulating the equa- tion through standard algebraic laws and acquiring an exact solution, much like a human would do. The field of symbolic computation is known as Computer Algebra, and a program that can manipulate mathematical formulas symbolically is called a (CAS). CASs are typically used to solve hard problems where an exact rather than an approximate answer is needed. Naturally, the solver program in this thesis should be a CAS, as numerical methods are not appropriate. Since there are many different problem types and many different levels, from the simplest in primary school to and theory in universities, the scope of this thesis has to be narrowed. This thesis will focus on high school maths, similarly to Mathleaks. Specifically, it will be focused on algebra in the first two years of the Gymnasium (the Swedish equivalent to high school). Algebra is well suited for a pedagogical CAS because it is rather straight forward and there are many problems that will not need any figures or graphs. We define a problem type as a particular goal. The three problem types looked at in this thesis are Simplify, ShowEquality and Solve. Simplify takes a mathematical as input, and simplifies it as much as possible. ShowEquality takes two mathematical expressions as input, and shows how the first can be rewritten to the second. Solve takes a mathematical and a , and will solve the equation with regards to that variable. A problem instance is a problem type with input values specified. The characteristics of a solution have to be considered. A pedagogical solution to a problem can have many different aesthetics, and may have many different features, like explanation text, figures, graphs, math calculations etc. A typical feature is the so called step-by-step calculations. It is a list of mathematical expressions with explanations in-between that show how to step-by-step solve a problem. See figure 1.1 for an example. The final expression in a step-by-step solution is the answer. The explanations can also be called justifications. Without the aim to work as pedagogical tools, the established algorithms in CASs that provide only final answers are not well suited for producing solutions, so new ones need to be invented. The CAS that will be developed as part of this thesis will be limited to producing just step-by-step solutions, and will not include figures, graphs or text outside of the justification. Since the application of the CAS is educational, the generated solutions have to be pedagogical. This should be the most important property of the CAS. No matter the speed or scope of the CAS, if the solutions are disliked by students the CAS is ineffectual. It will therefore be evaluated only on the pedagogy of its solutions. Pedagogy is a subjective matter to some extent, so the evaluation will be done through a study on the demographic of the CAS.

2 1.3. LAYOUT OF THIS DOCUMENT

Figure 1.1. An example step-by-step solution.

1.3 Layout of this document

Chapter 2 provides a background to the topics of computer algebra systems and pedagogy while summarizing previous relevant work. Chapter 3 follows with a more in-depth explanation of how computer algebra systems are constructed and the main problems associated with them. Chapter 4 takes the knowledge from the previous chapter to explain how those concepts are used within a specific CAS, namely SymPy. The CAS with step-by-step capabilities described in this thesis is largely built upon SymPy, and chapter 5 explains the work in detail. Chapter 6 describes the evaluation study and presents the results. Chapter 7 discusses the results and closes with a summary and conclusion.

3

Chapter 2

Related Work

The scientific field of the work in this thesis is a crossover between pedagogy and computer algebra. This chapter gives an overview of related work, starting off broadly in section 2.1 about pedagogy and CASs, and then narrows the focus in section 2.2 to research and work of CASs with at least some step-by-step capabilities.

2.1 Applications of computer algebra

CASs are used for different purposes. There are the more specialized CASs, for example Macaulay [1] that specializes in algebraic and commutative alge- bra, and the general-purpose CASs, like SymPy [2]1. These systems do, however, not allow step-by-step solutions for students. Since this thesis is about pedagogical CASs, the focus of this section will be on uses of CASs in education and pedagogy. Anthony and Walshaw’s work [3] contains a great general view of how a math- ematics education should be like. Among other things it mentions that an effective teacher should use tools and technology in order to promote student’s thinking. Similarly, Kutzler [4] writes a piece about usage of CASs (specifically, graphing ) in education. The article states that if technology is used properly, it leads to significant benefits in the education both for the students and the teacher. A great way of utilizing a CAS in is the scaffolding method [5], in which students utilize a CAS for already known concepts and instead focus their thinking on new concepts, in order to learn more efficiently. With technology constantly improving it is clear that it should be utilized in eduction of mathematics. Learning with the assistance of technology is known as computer-assisted in- struction (CAI). A CAI tool could be simple or complex. In a patent by [6], an interactive step-by-step method for graphical calculators to help

1There are many more examples of CASs in both categories, too many to list them here. Substantial lists can be found at the web page of SIGSAM at http://www.sigsam. org/Resources/Software.html (last retrieved 2015-05-31) and on Wikipedia http://en. wikipedia.org/wiki/List_of_computer_algebra_systems (last retrieved 2015-05-31)

5 CHAPTER 2. RELATED WORK students solve problems is described. In each step of the solving process (steps that are predefined), the student is given a choice of which operation to perform next. This is one example of the application of the scaffolding method where the concept to learn is . For example, the student would choose an operation such as subtract 2 from both sides. The system automatically performs elementary simplifications such as subtracting two because this skill should have been mastered by the student already. This way the student only focuses on learning the new skill. This could be generalized to an Intelligent CAI (ICAI) tool by removing the predefined steps, and instead let an intelligent system generate the choices. The simplification would then be done by a CAS. An example of an ICAI tool is Aplusix [7], an interactive tool for mathematical education that has been used in the class- room successfully [8]. It does not provide choices, but instead lets the student type his or her own formulas. Aplusix uses built-in rules to check if the student’s steps are correct. While CAI tools have proved to be helpful in education [9], their weakness is that they can only provide feedback and check the solution of the student for correctness but cannot solve the problem. The system cannot provide hints and solutions in the general case unless it is actually able to solve arbitrary problems. For that, the system would have to be a full CAS, and capable of generating step-by-step solutions.

2.2 Computer algebra systems with step-by-step capabilities

In [10] Tõnisson gives a brief overview of aspects of step-by-step solution genera- tion in CASs and compares a few CASs and their step-by-step solution capabilities. MATHPERT [11] is an old system that can produce step-by-step solutions with fo- cus on algebra, and first semester calculus. Another system is Edusym [12, 13] that is capable of solving high school math problems step-by-step. It has since then been further developed and is now known as [14]. Mathe- matical Assistant on Web [15] is capable of solving several types of mathematical problems with explanations and step-by-step solutions, but is limited in the prob- lem types it supports. Microsoft have released their CAS Microsoft Mathematics [16] that has some capabilities in solving step-by-step but otherwise is rather limited in step-by-step calculations. [17] is another web resource that has implemented the derivative rules in JavaScript to produce the intermediate steps, and then uses Maxima2 for the derivations themselves. Wol- fram Alpha [18] is a general purpose knowledge engine with a feature of solving math problems with step-by-step solutions.

2Maxima is a free CAS that was originally released in 1982. It does not have step-by-step capabilities.

6 2.2. COMPUTER ALGEBRA SYSTEMS WITH STEP-BY-STEP CAPABILITIES

Most of the previously mentioned systems are proprietary and without open source code. There are also few research papers on the subject. This means that a person cannot easily take a look at previous work to construct his or her own CAS with step-by-step capabilities. This becomes an issue when the existing systems do not cover all use cases. An example use case, which is also applicable to this thesis, is education in countries where English is not the primary language. All the previously mentioned systems are in English. Other issues that could arise are solutions on a wrong level (for example primary school versus university level mathematics), the aesthetics being wrong or that the logic of the solutions is too different from how a human would do it. It is easy to find imperfections for a particular use case in existing systems. The best way to get around these issues would typically be to construct your own system. This thesis is an attempt at adding research to methods of constructing a CAS with step-by-step capabilities while also constructing such a CAS which covers the use case of Mathleaks. There has also been no real evaluation performed of the existing systems. Con- sidering the systems are meant to be used in education, the pedagogical aspect of the solutions is very important. An important part of this thesis is a pedagogical evaluation of the developed CAS. While the study is too small to compare all ex- isting systems, it gives an indicator of how computer generated solutions compare with human made ones.

7

Chapter 3

Computer Algebra Systems

This chapter gives a general overview of computer algebra systems, focusing on the problems associated with how they are constructed. It is largely built on material from classical books [19][20], the project [21] and the CAS [2]. The information and format of this chapter serves as context and as a basis to the next chapter, which uses the same section layout but to describe the implementation methods of a particular CAS and how it solved the problems discussed in this chapter.

3.1 A brief overview

The field of computer algebra matured around 1961-1971 when some of the earliest CASs were created [19, p. 4]. The field can be divided into three sub-domains. The lowest level is almost pure computer science and is the study and development of algorithms. The next level is systems, which encompasses development of computer algebra systems. A CAS uses the algorithms to perform symbolical operations on mathematical formulas and objects. The high-level domain is the application of CASs and is what drives the development of algorithms and systems. The work of this thesis encompasses all three domains. Since algorithms are an part of CASs, the part of algorithms is described within the CAS construction subsection. A background on applications of CASs was given in section 2.1.

3.2 Construction of computer algebra systems

A CAS consists of three main parts. A base structure to represent mathematical formulas and objects, the algorithms to operate on those formulas, and an interface.

3.2.1 Representation of primitive types No matter how mathematical formulas are represented, a way of storing the prim- itive types is needed. Primitive types are the integers, rationals, real and complex numbers. This is comparable to the algebraic domains Z, Q, R and C.

9 CHAPTER 3. COMPUTER ALGEBRA SYSTEMS

Integer types in computers are typically single-precision, which means their range is fixed, depending on the inner representation in the computer, i.e. 16-bit, 32-bit 64-bit etc. The narrow range is a limitation for a CAS. It is more desirable to use multi-precision integers, which theoretically can have any value (the practical limitation is computer memory, but an would have to be absurdly large for it to actually be an issue). [19, p. 93] defines multi-precision integers as a linear list (d0, d1, ..., dl−1) of single-precision integers and a sign s that is either -1 or 1. This gives the value

Pl−1 i s i=0 diβ where β is a pre-specified base. This is comparable to the standard system with base 10 - the digits 0-9 is the single-precision integer and any other number is a multi-precision integer, as it consists of a number of digits and a (optional) sign. Multi-precision integers can be stored for example in a list or array. Alterna- tively, many modern programming languages have multi-precision integers in their standard that could be used. Real numbers, or floating- numbers, are more complicated but in most programming languages the IEEE 754 standard for floating-point arithmetic [22] with double precision is available and could be used. It should be enough in most situations, but for more exact calculations quadruple precision could be used. A can simply be stored as a pair of multi-precision integers, where one number is the numerator and the other is the denominator. Similarly, a can be stored as a pair of multi-precision integers (or rational numbers or floating-point numbers), where one integer represents the real part and the other the imaginary part.

3.2.2 Representation of mathematical formulas

A a(x) of degree n is written as

2 n Pn i a(x) = a0 + a1x + a2x + ... + anx = i=0 aix

To represent a polynomial over one variable, only the coefficients and the variable itself have to be stored. For the coefficients, there is the sparse representation that stores all coefficients including the ones that are zero. For a sparse representation only a list is needed, and the degree of the polynomial is given by the length of the list. The compact representation instead stores a pair (degree,coefficient) for each non-zero coefficient. For example, the polynomial

a(x) = 2 + x + 4x3 is represented as (x,[2,1,0,4]) in the sparse representation and as (x,[(0,2),(1,1),(3,4)])

10 3.2. CONSTRUCTION OF COMPUTER ALGEBRA SYSTEMS in the compact representation1. Depending on the amount of non-zero coefficients, one may be more space efficient than the other, and depending on which algorithm is used one may be more time efficient than the other. This seems like a very limited data structure. However, it can easily be extended to multivariate if we allow the coefficients to be other polynomials. For example, the bivariate polynomial

a(x, y) = 5x3y2 − x2y4 − 3x2y2 + 7xy2 + 2xy − 2x + 4y4 + 5 could easily be rewritten as

a(x, y) = 5y2 x3 + −y4 − 3y2 x2 + 7y2 + 2y − 2 x + 5 and be stored as a polynomial with polynomial coefficients. With this method arbitrary multivariate polynomials can be represented. To represent a math formula as a polynomial is convenient because that opens for application of ring theory. If R is a ring, then R[x] is the corresponding to the polynomial a(x) with the coefficients a0, ..., an being elements of R. For many applications and algorithms in algebra, representing math formulas as polynomials is beneficial. However, a general purpose CAS should be able to store more types of formulas than just multivariate polynomials. This can be extended to rational functions [19, p. 60-63] and power [19, p. 63-70], but there are still more different constructs. roots, , , , , sums, products and matrices, just to name a few. A more flexible way to represent a mathematical formula is the tree represen- tation. For example, it is used in the CAS built in [21] and in SymPy. This representation converts the math formula to a tree structure, similar to a syntax tree. The tree representation of the polynomial

a(x) = 2 + x + 4x3 is shown in Figure 3.1. The leaf nodes are atoms. An atom can be a primitive type, a variable or the imaginary unit i or any other entity that does not have any further operands. The other nodes can be operators or specific functions. The main advantage of the tree representation is that it is conceptually simple and is extremely flexible in the types of formulas that it can represent. Its main disadvantage is that it is slower in regards to certain algorithms.

1 The (a, b) notation means a pair of a and b. The [a0, a1, ..., an−1] notation means a list with elements a0 to an−1. The length of the list is then n.

11 CHAPTER 3. COMPUTER ALGEBRA SYSTEMS

Figure 3.1. Tree representation of polynomial a(x) = 2 + x + 4x3.

3.2.3 Algorithms The algorithms part of a CAS is arguably the most important part as it decides the final capabilities of the CAS. This part is very broad and there are too many algo- rithms to describe here. This section will list and briefly describe some important ones. It does not aim for completeness nor in-depth descriptions, but rather to give an overview of some of the problems that CASs usually have to solve. A basic problem is checking for equivalence. How does the CAS know when two formulas a and b are equal? A trivial way of solving this problem would be to check a ≡ b, i.e. checking if the formulas are identical, which could be defined as a and b being completely equal in memory. However, consider the situation where a = x + 3 and b = 3 + x. Here a and b are clearly equal2, but not identical. Then there are harder cases, for example sin2 x + cos2 x − 1. This formula is equal to 0 after usage of the trigonometrical identity and then a subtraction. How far should the equivalence function check for ? If it only checks that they are identical, a user may get annoyed that x + 3 will not be equal to 3 + x, but if it tries to check too far, it may lead to unexpected performance issues in complicated calculations, if a user is unaware of what the equivalence function does. A compromise is the idea of a canonical form. A definition of canonical form is given in [19, p. 82]. Briefly, it is a type of standard form for a formula. All formulas have one specific and determinable canonical form. A practical example is the polynomial a(x) = x+x5−2x3+2, which could be rewritten to a(x) = x5−2x3+x+2 before being operated on. A canonical function is the function that rewrites a formula to its canonical form. Different CASs use different canonical functions. A CAS that uses the polynomial representation would maybe want to expand even huge powers like (x + 2y)100 in its canonical function, while a tree representation would want to keep it as a power. All formulas should at all times be converted to their canonical form, so the canonical function must be efficient and cannot be too

2They are identical after application of the commutative law, i.e. a + b = b + a

12 3.2. CONSTRUCTION OF COMPUTER ALGEBRA SYSTEMS complicated. Doing that allows the equivalence function to be kept simple and fast as it would be sufficient to only check if two formulas are identical. Typical CAS applications, such as polynomial and solving systems, use at their core polynomial that uses the greatest com- mon divisor (gcd) function. The gcd function typically uses the [19, p. 33-34][20, p. 45-49], or a variation of it. For more advanced operations, it is often desirable to find the Gröbner bases [23] of polynomials. The Gröbner bases hold important properties about the polynomial and are used for among other things of solving systems of polynomial equations. For example, SymPy uses Gröbner bases to solve equation systems. One algorithm for finding Gröbner bases is the Buchberger’s algorithm [24]. When it comes to factorization of polynomials, the choice of algorithm depends on the domain of the coefficients. A relevant problem to this thesis is factoring polynomials over integers. This can be solved for example by Berlekamp’s algorithm [25] or the Cantor-Zassenhaus algorithm [26]. Another useful method is . [19] provides a few algorithms for performing Gaussian elimination. Turing 1948 [27] showed that Gaussian elimi- nation is equivalent to LU decomposition of matrices, which is a way to factorize a into a lower triangular matrix and an upper triangular matrix3. Aside from Gaussian elimination LU decomposition is also used for finding a matrix’s inverse and to find its . A compilation of algorithms for Gaussian elimination and LU decomposition can be found in [28]. Derivation of mathematical formulas is easy as there are only a few rules that have to be implemented and then applied recursively [21]. Integration is significantly more complex. The main algorithm for integration is the [29, 30], which can integrate most functions. The algorithm is also described in [19, p. 511- 569].

3.2.4 Interface An interface is the front-end part of a CAS, and the part that a user would interact with. Even if a CAS has implemented a great amount of algorithms, a terrible interface may scare users away. Different CASs have different interfaces. Some may have a graphical (GUI), some just have a command interface, and others are just libraries. For example, SymPy is a python package and thus its interface can be the Python . For pedagogical applications, the interface is very important. However, the solver proposed in this thesis is intended to be used internally at Mathleaks, so the interface is of little importance in the initial version and can be limited to a command-line application.

3A lower triangular matrix has all zeros under the main diagonal. An upper triangle matrix has all zeros over the main diagonal.

13

Chapter 4

SymPy

The CAS with step-by-step capabilities described in this thesis is largely built upon the CAS SymPy1. This chapter describes SymPy and how it is constructed with regards to the problems brought up in chapter 3. The information is based on studies of the source code in SymPy, which is freely available since it is open source. SymPy is a Python package that aims to be a full general-purpose CAS. This is not a complete explanation of all of SymPy’s features, but only of the features and details relevant to this thesis. SymPy was chosen because it is easy and convenient to modify. It is open source with a BSD license, it is a Python library and uses only Python constructs and needs no external languages or packages. On top of that, it uses the tree representation for formulas.

4.1 Representation of mathematical formulas

SymPy uses the tree representation, with each node being represented by an object of some class. Every class in SymPy inherits directly or indirectly from the class Basic. It contains properties and functions that other classes can override. Figure 4.1, which shows the class hierarchy, is incomplete but covers the parts that are relevant to this thesis. Basic has the property args. It is a that contains the arguments of an object. For example, the class Add is used to represent the addition operation. Its args property thus holds a tuple of its operands. The formula a + b is represented in SymPy as an instance of Add with the args property to (a,b). These arguments can be other SymPy objects, and that is how the formula tree is built. All classes in SymPy are immutable2. This means that created objects that

1The version used and described in this thesis is SymPy version 0.7.5. 2Assuming that private functions and properties are not accessed. In Python the convention is that properties whose name begins with an underscore are private. There are ways to get around the immutability, but as long as a user sticks to the "intended" functionality, objects are in practice immutable.

15 CHAPTER 4. SYMPY

Figure 4.1. A diagram over the classes in SymPy that are relevant for this thesis. represent math formulas can not be altered, and it creates a safety in the sense that SymPy functions and algorithms will not perform operations on inputted formulas. Instead they would return new objects or copies. SymPy also contains a cache, which allows already evaluated objects not to be re-evaluated. The cache’s functionality depends on the immutability of SymPy objects. The next important class is Expr. It contains functions for the arithmetic operations3. Take addition for example. Any class that inherits from Expr can be added to any other class that inherits from Expr. The default behavior is to simply return an instance of the class Add with the two operands as arguments, but it can be overridden by inheriting classes. This allows easy typing such as a + b + c if a,b and c are instances of classes that inherit from Expr. AssocOp stands for associative operations, and is the base class (together with Expr) for Add (the class for addition) and Mul (the class for multiplication). It contains a number of functions and properties that are similar in both Add and Mul. The main property is the associativity property. From the documentation of AssocOp:

(a op b) op c == a op (b op c) == a op b op c

3Specifically, in Python it overrides functions like __add__ and __mul__.

16 4.2. CREATION OF SYMPY OBJECTS

There are no separate classes for subtraction, division or negation. Instead, a-b is transformed into a+(-b), -b is transformed into -1*b and a/b is transformed into a*bˆ-1. Pow Expr Powers are represented by the class . It directly inherits from √ and must have exactly two arguments. Roots are transformed into powers, as b a = a1/b. SymPy’s answer to the polynomial representation is the Poly class. Many algo- rithms in SymPy, like factorization and equation solving, will convert the expression to a Poly first. Relational is the base class for the comparison operators, and inherits directly from Expr. The relevant subclass is Equality. If f is an Equality object, then lhs(f ) is the left hand side of f and rhs(f ) is the right hand side of f. All leaf nodes in the tree representation are in SymPy represented by classes that inherit from AtomicExpr. They have no elements in args. Instances of these classes are called atoms. There are three main atom types relevant: Symbol, ImaginaryUnit and Number. Symbol is the class used for variables, Number is the base class for numeric types (or numerics for short) and ImaginaryUnit is the class that represents i. The Float class is used for floating point numbers and uses the Python type float to store the float value itself. The Rational class contains the proper- ties p and q, and represents the rational number p/q. Both p and q are of the Python built-in type int, which is multi-precision. The Integer class inherits from Rational, and has q set to 1. There are also a few numeric constants used to speed up operations. Lastly, an example: the mathematical formula 2 y = 4x2 − + i 3x becomes

Equality(y,Add(Mul(-2/3,Pow(x,-1)),Mul(4,Pow(x,2)),I))

Also see Figure 4.2 for a graphical representation of the tree. Note that there are changes from the original equation. This has to do with canonical form. More in section 4.2.

4.2 Creation of SymPy objects

SymPy classes override the __new__ and __init__ functions to do preprocess- ing4. This can be considered as the canonical function. Add and Mul remove their

4In Python, similar to other programming languages that features classes, a class initializa- tion function is often used (also known as the constructor function), which in Python is called __init__. However, before the object instance can be initialized, it must be created. In Python, the instance creation process can be overridden by overriding the __new__ function, which takes

17 CHAPTER 4. SYMPY

2 Figure 4.2. The formula y = 4x2 − + i in SymPy tree representation. 3x respective identity elements5 from the argument list. Then the arguments are sim- plified by their respective _flatten functions, which simplify numerics and groups similar arguments, i.e. 2x + x becomes 3x. If there is only one argument left after this process, it itself is returned. Otherwise, the arguments are sorted according to their hash values before the object is returned. Another thing _flatten does is to flatten the formula, i.e. if an argument to Add is another Add, the arguments are merged and put into one single Add. The creation of a Pow object also does some preprocessing, like simplifying numerics and other possible simplifications like a1 = a and a0 = 1. Creation of SymPy objects also typically call the function sympify on the arguments. sympify convert non-SymPy objects into SymPy objects when appro- priate. It also features a full parser for strings. If sympify is called on a SymPy object, the object is simply returned immediately. To override __new__ rather than using singleton factory classes has both ad- vantages and disadvantages, but additional discussion of that design choice is not relevant here.

4.3 Algorithms

The canonical function in SymPy was explained in section 4.2. The function that checks equality is the __eq__ function that uses the Python symbol ==. It checks if the two SymPy objects are identical in the of the tree as well as the value of each node, with a few exceptions. When two instances of classes derived from Number are compared, their numerical values are compared without regard to the class type. This means that a Float instance may be equal to an Integer instance or Rational instance. For example (the Rational(1,2) notation means that it creates a Rational object with p set to 1 and q set to 2): the class type as a . This function must return an instance of some class. 5The identity element is 0 for addition and 1 for multiplication.

18 4.4. OUTPUT

>>> Rational(1,2) == Float(0.5) True >>> Integer(3) == Float(3.0) True >>> Integer(3) == Rational(18,6) True Another feature in the equality function is that the Numeric classes can be compared to Python’s own numerical types int and float and return True if their values are equal. SymPy implements the gcd function, which uses different algorithms depending on the input. For example, for integers it uses the Euclidean algorithm. For two polynomials in Z[x], it uses a heuristic algorithm. Similar approaches are used for other problems as well. Input is throughly analyzed and then an appropriate algorithm is chosen. In general, these algorithms are not used in the work of this thesis, because they perform large operations in just one step. More about this will be discussed in section 5. An exception is the function factor that is the main wrapper function for factoring expressions. Depending on input it uses different algorithms, but for factoring univariate polynomials over the integers, it uses the Zassenhaus algorithm.

4.4 Output

While SymPy is a library, it does also feature an extensive printing module. The three relevant printers are the plain printer, the string printer and the LaTeX printer. The plain printer is useful for debugging. It gives the output on the form of Class(arg1,arg2,...), and shows the exact tree representation of the printed object. The string printer gives nice output on "standard" form, and is the default printer used by the __str__ and __repr__ functions6. The LaTeX printer converts the SymPy object into a LaTeX math formula. Example outputs7: >>> from import * >>> from sympy.printing. import LatexPrinter >>> lp = LatexPrinter() >>> a = sympify("2*x**2 - 4*x + 1") >>> a.precise_repr() ’Add(1,Mul(2,Pow(x,2)),Mul(-4,x))’ >>> str(a) ’2*x**2 - 4*x + 1’ >>> lp.doprint(a) ’2 x^{2} - 4 x + 1’

6In Python, __str__ is the "to string" function of a class. __repr__ works similarly but is used when printing to the console. Typing just a in the Python interpreter invokes the __repr__ function of a, if one exists, and calling str(a) in either the interpreter or anywhere else returns the return value of the __str__ function of a, if such a function exists. 7The precise_repr function is another modification to SymPy to easily access the plain printer. Also, a**b means "a to the power of b" in Python, and SymPy adopted that syntax.

19

Chapter 5

Method

This chapter describes the work in detail of how the CAS with step-by-step capa- bilities was constructed. First, section 5.1 describes additional problems associated with the step-by-step capabilities, that do not apply to a CAS without them (like SymPy). Section 5.2 explains why this particular method of extending SymPy was chosen and gives an overview of this method, which includes both modifications to the core code of SymPy (described in section 5.3), and addition of new code that runs on top of SymPy. The new code includes a set of mathematical rules that ma- nipulate mathematical formulas (described in section 5.4), search algorithms that utilize AI methods to apply these rules in a specific order to create the step-by-step solutions (described in section 5.5) and an interface (described in section 5.6).

5.1 Considerations for a step-by-step CAS

A CAS is constructed with a goal in mind. Depending on that goal, different design choices have to be made. A CAS specializing in would be constructed differently than a general-purpose CAS. The same applies for a CAS capable of constructing step-by-step solutions.

5.1.1 Pedagogical considerations

The justifications in a step-by-step solution have to be pedagogical. That means that the steps the algorithm performs in the solving process must not contain mate- rial that the students have not yet learned. Most advanced papers about computer algebra algorithms, as well as the books [19] and [20], discuss material from . Most algorithms revolve around group theory, which is typically not appro- priate material for high school students. Then there is the question about granularity of the solution. Even the trivial example

21 CHAPTER 5. METHOD

1 + 2 + 3 can be simplified in different amount of steps. Here are a few examples.

1 step 1 + 2 + 3 =⇒ 6

2 steps 1 + 2 + 3 =⇒ 3 + 3 =⇒ 6

3 steps 1 + 2 + 3 =⇒ (1 + 2) + 3 =⇒ 3 + 3 =⇒ 6

For elementary school students just learning addition, perhaps the two-step or three- step simplification is more appropriate, depending on if the concept of parentheses has been taught, but for high school students the one-step would be more appro- priate.

5.1.2 Differences to normal CASs Three main objectives that drive the development of CASs are generality (how many different problems that can be solved), speed (how fast the problems can be solved) and the interface (how simple and straightforward it is)[20, p. 18]. Generality is valuable for pedagogical math because then more types of problems can be solved. However, problems in math text books do not typically feature overly complex math formulas. While it is still good to be able to solve arbitrary problems, algorithms can make assumptions about the size of the formula tree. Other observations can be made in that if a formula in the problem does not have decimal numbers, the answer will most likely have not. Numerical values will also rarely be larger than 1000 (or smaller than -1000), but it is often between -100 and 100 or even between -10 and 10. Certain algorithms can take advantage of this1. Another possible hint a CAS could take would be the context of the problem. Typically, a chapter in a text book would introduce a new concept, and then have exercises about that concept. A CAS could take the concept as an extra argument as a hint of which rules to use. The speed factor becomes relevant in formulas and polynomials of very high degree, i.e. many thousands if not millions coefficients, where a low asymptotic time complexity is a highly desired property in an algorithm. High school math (and lower level) will rarely have polynomials of degree higher than 5. It is more important to minimize algorithm overhead and any high constants rather than use algorithms with low asymptotic complexity. The interface can be more or less important depending on the intended use of the CAS. If a student is supposed to interact with the CAS directly, then the interface is of very high importance. The pedagogical value would be diminished if the interface is inconvenient. If the interface is only used by the author(s) of a

1For example, one possible way to implement a polynomial factorization algorithm would be to guess at typical factors, like (x+1) or (x-2), and then check the to see if the guessed factor was correct.

22 5.2. OVERVIEW OF CHOSEN METHOD math text book or like in this case internally at a company, the interface is of less importance. There is also special consideration that should be made about canonical form. As discussed in section 3.2.3, CASs try to keep all formulas on canonical form. However, for pedagogical step-by-step solutions this is not a desirable feature. For- mulas should not change without the justification explaining why. Any manipulation should be explained.

5.2 Overview of chosen method

There are two main approaches to constructing a step-by-step CAS. Either you construct a new CAS from scratch, or use an existing one and "enhance" it with step-by-step functionality. Creating one completely from scratch gives more control and if done well it is the ideal. However, it takes time and planning to do it properly, and many implementation parts will have been done before. As discussed, at the core of a CAS is the representation of math formulas, which has to be done regardless of approach. To have more time for the step-by-step capabilities, the base from another CAS was used. As stated earlier, SymPy was chosen because it is easy and convenient to modify. The method chosen is to use SymPy as a base for the representation of formulas, and then write additional functions and algorithms that use the SymPy library and SymPy objects to create the step-by-step capabilities. However, before SymPy can be used for this purpose, some modifications have to be made. These are discussed in section 5.3. The algorithm part of the step-by-step CAS is the most interesting. The biggest question is how to generate the steps. A natural way to proceed is to think of each step as an application of a rule. A rule is a function f : expression → expression. An expression in this case is a SymPy formula. The solving process begins with a formula, and applies various rules until some goal is reached. The trail of applied rules and resulting formulas gives the step-by-step solution. A rule should be deterministic, so that the same resulting formula is returned for the same input. Formally, if a and b are formulas and f is a rule function, then

a = b =⇒ f(a) = f(b), ∀a, b, f

Because of this, it is possible to construct a step-by-step solution from only having the initial formula and a list of rules. The rules would be applied in order on the initial formula to get the intermediate steps. If a rule f cannot be applied on a formula a, it is convenient to define it as

f(a) = a

As discussed in section 5.1, the content of the justifications and the granularity

23 CHAPTER 5. METHOD of the solutions are important. For a CAS that would cover different levels of math, the user should be able to choose the granularity. That is a desired feature even for students of the same class, as some may prefer condensed solutions and others could prefer very elaborate ones. However, that is not within scope of this thesis so that was not implemented. Instead, the chosen solution is to try and model the solutions created by Mathleaks. The rules used are on a level that is appropriate for high school students, and they automatically give a fairly detailed granularity. More about chosen rules in section 5.4. The next question is rule selection. Which rules to apply and in what order? The problem can be stated like this:

Given a formula a and a goal, choose a rule f such that f(a) is closer to the goal than a.

This leads to the definition of closer. It is a concept that is easy to comprehend, but not easy to precisely define. A somewhat simple solution is to define a score function s : formula, goal → R, which gives a score value on how close the formula is to the goal. This is common in various AI algorithms, and is known as a heuristic function. The goal is also an ambiguous term, but it can be defined in a similar way. The goal function g : formula → Boolean returns true if the formula is the goal. Sometimes there is no clear goal. An example is simplification, and to some degree equation solving. Both involve simplifying expressions, and there is not always an unambiguous "simplest" expression. Take the formula a(b + c) = ab + ac. Both the left hand side or the right hand side could be considered a correct answer to a simplification problem. In the general case, the best option is to select the one with the best score, as given by the heuristic function. The solving procedure searches as many formulas as possible to find the formula with the best score. Because there are an infinite number of possible mathematical formulas, a time is needed. Since we can only search a limited amount of for- mulas during that time, the heuristic function is used to guide the solving procedure to prioritize searching formulas closer to the goal. More about this in section 5.5. Figure 5.1 shows an overview of the different modules of the step-by-step CAS. More about the interface in section 5.6. Finally, an evaluation has to be made of the CAS. Step-by-step solutions and their pedagogical values is a subjective matter, and as such the solutions of the CAS should be evaluated subjectively by the demographic. A questionnaire was distributed to three Swedish high school classes. More about this in section 6.1.

24 5.3. MODIFICATIONS TO SYMPY

Figure 5.1. An overview of the architecture of the step-by-step CAS.

5.3 Modifications to SymPy

5.3.1 Classes

Looking at figure 4.2, SymPy does not have classes for subtraction or division con- structs. So these classes have to be implemented. Fortunately, as stated in section 4.1, all that is required of a class to be compatible is to let it inherit from Expr. The class does not even have to contain anything to have the basics like arithmetic operators work with SymPy. Here is an example of this extensibility, using just the Python interpreter to create a class Demo that is compatible with SymPy:

>>> from sympy import * >>> class Demo(Expr): ... pass ... >>> x = Symbol(’x’) >>> d = Demo(x,2,3) >>> d Demo(x, 2, 3) >>> x + d x + Demo(x, 2, 3)

The following classes were created to complement SymPy:

Neg Unary negation operator.

Sub Binary subtraction operator.

25 CHAPTER 5. METHOD

Div Binary division operator.

Plusminus Binary plus-minus operator.

Nroot Binary root function - take the n:th root of something.

Sqrt Unary square root function

MulAnswers When there are multiple answers to a problem, like in trigonometry or second-degree equations.

A class inheriting from Expr makes it a SymPy class, which gives basic compat- ibility like the arithmetic operators, but not much more. SymPy will not know the context of the class, and it will not work properly with any algorithms in SymPy. Fortunately, these algorithms are not used in the step-by-step CAS2, and thus mod- ifying the algorithms within SymPy is not necessary.

5.3.2 Termlists and Factorlists Due to the introduction of the Sub class, extra care has to be taken to handle terms in math formulas. While in SymPy there was only the Add class to represent terms, now there are two different classes. For example, when multiplying a number into a list of terms (like x(2 + y − z) = 2x + xy − xz), it is convenient from an engineering viewpoint to think of the Add and Sub constructs as just ’terms’. Because of that, whenever an Add or Sub construct is encountered it is transformed to a termlist through the function get_termlist. A termlist is a list of 2-, where each tuple’s first element is the sign (either -1 or 1) and the second element is the content, which is the term itself. It is generated by performing a post-order traversal of the tree structure. Any node that is not a Add or Sub construct is considered a leaf by this algorithm, and is added to the termlist. Also, the second argument of a Sub construct is always considered a leaf. The sign is set to -1 if the node is the second child of a Sub node, and 1 otherwise. To allow for the first term of a termlist being negative, it is checked if it is a Neg construct. If it is, the first term’s sign is set to -1 and the content set to the argument of the Neg construct. An example on the formula −a − b + (−c): 3 >>> from sympy import * >>> a = simple_exact_parse("Add(Sub(Neg(a),b),Neg(c))") >>> get_termlist(a) [(-1, a), (-1, b), (1, Neg(c))] To transform a termlist back into a tree of Add and Sub constructs, the function termlist_to_add is used. It is not a perfect to get_termlist,

2With a few exceptions, namely gcd and factor. 3simple_exact_parse is my own function and parses a string of a SymPy expression in the tree notation.

26 5.3. MODIFICATIONS TO SYMPY as a termlist does not hold information about the tree topology, but it is not an issue as Add(Add(a,b),c) is equivalent to Add(a,b,c). Similarly to termlists for terms, there is also a factorlist for factors. This is used to easier deal with possible of Mul trees, i.e. Mul(a,Mul(b,c)) versus Mul(a,b,c). The function is called mulargs. A standard procedure in rules is to convert the Add, Sub or Mul construct into termlists or factorlists, perform some operation on it, and then convert the result back into a SymPy expression before returning it.

5.3.3 Object creation and canonical form

Because SymPy puts all objects on canonical form as soon as they are created, it could be confusing for an unsuspecting user who writes >>> a = Integer(2) + Integer(2) and then expects a to be 2 + 2, when a is actually simplified to 4. Use of the canonical function is the reason behind this automatic simplification that was dis- cussed in [10]. It could be argued that automatic simplification is not bad for pedagogical purposes. If a student already knows this skill, it is not necessary to show it as a step. However, to allow for full control, SymPy’s own canonical function has to be disabled. Instead, the creation of SymPy objects will happen through a factory function, a function which takes care of all the object creation and initialization. Note that this automatic simplification is only a problem in SymPy’s own classes - the extra added classes (see section 5.3.1) do not simplify automatically at all. Ideally, before the factory function can be created, SymPy’s canonical function would have to be disabled. Inconveniently, it is not a single function, but rather spread out over the classes (see section 4.2). So instead of disabling it, the chosen solution was to circumvent it in the factory function. The factory function is used whenever a SymPy object has to be created but not automatically simplified. It takes as input a class c and a list of arguments a. It creates an instance of c using the standard non-modified __new__ and __init__ functions (creating it "the intended way"), with dummy arguments such that the created object will not be simplified. Once the object has been created and initialized, the dummy arguments are changed to the arguments that are in a. This is done by modifying the private property _args of the created object. When creating atoms (i.e. c is a class that inherits from Atom), this procedure is not necessary, as atoms are in general not simplified or altered. The main exception is that when creating a Rational, the arguments will be shortened if possible, i.e. when the greatest common divisor is not 1. This is handled in a similar manner as non-atom classes - create a Rational with dummy arguments that will not be shortened, and then manually set the p and q properties. The direct modification of the "private" properties of SymPy objects in the factory function violates the immutability aspect of SymPy that was discussed in

27 CHAPTER 5. METHOD section 4.1, which caused an issue with the built-in cache in SymPy. For the factory function to work correctly, the cache had to be disabled. Fortunately, there is an option in SymPy to disable its caching. The amount of calculations done in the step-by-step CAS is not large enough to significantly affect performance. This altogether gives a very convenient factory function that creates SymPy objects without using the canonical function, with no modifications in the original SymPy code. Because the creation happens inside one function, it is easy to do anything else that might be deemed necessary - like using a custom canonical func- tion. A canonical function was implemented, but for flexibility it is not used in the factory function. Instead, any algorithm that wants formulas on canonical form has to call the canonical function explicitly. As such, see section 5.5 for motivation and description of it.

5.3.4 Input The factory function is great for use within code, but cannot be used for user input. For that, a parser is needed. The problem with the parser used by sympify is that it uses the default initializers and constructors of SymPy, and not the factory function. So modification to the code was necessary. As mentioned in section 4.1, when using arithmetic operators, SymPy uses the corresponding functions in Expr to create the correct objects. The parser also uses them. Because of that, it was a good place to inject code into. The default is to just call the standard constructor, for example __add__(self,other) would return an Add(self,other). This was modified to instead use the factory function when creating the object. It also creates an object of the appropriate class, for example __sub__ create a Sub instead of an Add with a negative second argument. This was done for all the relevant functions. Below is a list. The functions whose name begins with an ’r’ are for "reverse" order of operands, for example x+1 versus 1+x.

• __neg__

• __add__

• __radd__

• __sub__

• __rsub__

• __mul__

• __rmul__

• __pow__

• __rpow__

28 5.3. MODIFICATIONS TO SYMPY

• __div__

• __rdiv__

SymPy also provides functions sqrt and root that normally return a Pow object. These were modified in a similar manner as well to instead return Sqrt and Nroot objects. Further, the classes Number, Integer, Rational and Float also override these arithmetic functions. The purpose is that when numeric types are added to other numeric types, this can normally be simplified. For example, Integer(1) + Integer(2) would immediately create a Integer(3) instead of Add(1,2). The same applies for all these operators. However, when this cannot be done, for exam- ple when an argument is not a numeric type, it calls the corresponding arithmetic function of the base class. Eventually it would reach Expr. To keep code modifica- tion to a minimum, these functions in the numeric classes were only modified such that they would always call the corresponding function of the base class. The result of these modifications were that the operators of SymPy would use the factory function that would create the object with the given arguments without performing any simplifications or alterations. As a result, SymPy’s own parser could be used as a method of writing input in a "natural" way. Some rules do want to use SymPy’s simplification though, so this canonical function is toggle-able through the PARSE_TO_MOD setting in config.

5.3.5 Output

The printers in SymPy work well with the original SymPy classes. However, when encountering new classes like Sub, it reverted to the plain printer even when origi- nally using string or LaTeX printers, as it did not know how to handle it. As such, new string and LaTeX printers had to be created. The string printer was mostly used for demonstration and debugging purposes as it gives an easy readable output. The LaTeX printer was used for the actual output generation in the evaluation, see section 6.1. Note that in this section, the term "print" means to "create a string of". Fortunately, SymPy made the printers in such a way that they are easily extend- able. Every type of printer is a class, with functions that handle printing of different mathematical constructs. To create a custom printer, it is enough to inherit from another printer, and override the relevant print functions. The class ModPrinter was created, which inherited from the string printer. For Add and Sub, it first converts the object into a termlist and then prints it in the correct order with signs + and − used as appropriate. Printing Mul uses mulargs and simplify prints all arguments with a * sign between them. Neg puts a − before the argument. Div puts a / sign between the arguments. Similar for the other classes. There is also the question of where to put parenthesizes. To be safe, it could be put everywhere, but that would not be very readable. The amount of parenthesizes

29 CHAPTER 5. METHOD was kept to a minimum, and only used where it would not create any ambiguity or an incorrect meaning. ModLatexPrinter was handled in the same way. It inherits from the LaTeX printer and works in a similar manner. However, the output from this printer is what ultimately would be seen by the students, so extra care had to be taken for the output. The main difference to the functionality of the ModPrinter is that it also considered that the multiplication symbol · should not be explicitly written in every multiplication. The decision process for that was made simple. If the second factor was a number or began with −, the multiplication symbol would be used. If the first factor had parenthesizes but not the second, the multiplication symbol would also be used. In all other cases, no explicit symbol would be used. This handled most common cases. Ideally, more complete rules should be used but other parts of the CAS were prioritized in the limited time frame of the thesis project. Finally, an example of the modified input and output. Compare to the printing in section 4.4.

>>> from sympy import * >>> config.PARSE_TO_MOD = True >>> mp = ModPrinter() >>> mlp = ModLatexPrinter() >>> a = sympify("2*x**2 - 4*x + 1") >>> a.precise_repr() ’Add(Sub(Mul(2,Pow(x,2)),Mul(4,x)),1)’ >>> mp.doprint(a) ’2*x^2 - 4*x + 1’ >>> mlp.doprint(a) ’2 x^{2} - 4 x + 1’

5.4 Rules

This section gives a list of some of the most important rules used in the final evaluation CAS. The rest of the rules can be found in appendix D. The CAS does not include any other rules, or variations of implemented rules that could be used but were not used. This particular set of rules is not complete for high school math but allows the CAS to solve most problems found in the chapters about algebra in common Swedish high school math text books (see section 6.1). A rule is a function that takes a formula as input, and returns another formula. A rule will only look at the root construct, and is not inherently applied recursively. Recursive application is instead handled in the search part of the CAS, see section 5.5. For each rule, it is described what it does for certain input. It is implicitly understood that for any other input, the rule returns None (i.e. it cannot be applied).

30 5.4. RULES

5.4.1 sortBoth This rule sorts terms and factors. It actually consists of two parts, one that only sorts terms (Add and Sub converted to termlists) and one that only sorts Mul arguments. This rule is mostly used by other rules, which sort their output to make it look more normal for humans. This rule could be considered part of the canonical function, as it is used so much. The term sorting puts every term in one of four categories: Other, Symbol, Othernum and Numeric. The terms are sorted according to their category, in the order they were mentioned. If there are more than one term in any category, they are sorted according to specific rules. Numeric is the category of numerical types, i.e. objects of classes derived from Number, and is sorted according to the numerical value in ascending order. The Othernum category is for constructs that themselves are not numeric but instead only contain numerical arguments. These are just sorted according to their hash value in ascending order. The Symbol category consists of formulas that are either just a symbol, or a multiplication that may contain numerics, symbols and powers of a symbol to an integer. This category is sorted in ascending order according to four values, in the order of importance (i.e. if first value is equal, sort by second. If that is equal, sort by third, and so on):

1. Maximum degree of symbol

2. Number of different symbols

3. Name of symbols - alphabetic ordering

4. Hash value of numerics

Other is the category for everything that does not fit in any of these categories. It is sorted in ascending order according to the hash values of the formulas. Finally, after the sorting one final operation is done. If the first term is negative, it is switched place with the first non-negative term. This is done because arguably formulas like 1 − x look better than −x + 1. A variation of this rule that does not perform this final step exists, and is used among certain other rules that specifically sort polynomial expressions. Then it better to keep the "proper" ordering of terms instead of something ugly like 2x − 3x2 + 7. This variation is called PolySort. The other part of this rule is sorting of multiplications. It uses mulargs to get the factorlist, and then sorts it with a similar method as the term sorting, but with a few more categories. Each factor is sorted according to category, in the order they appear on the list. The categories and how they are sorted within can be seen in the list below.

Numeric All instances of classes derived from Number. Sorted by numeric value in ascending order.

31 CHAPTER 5. METHOD

Othernums All formulas that are not in category Numeric, but for which is_number returns True. Sorted by hash value in ascending order.

Symbols Any formulas that either are a symbol or a power of a symbol to an integer. Sorted primarily by alphabetical order. If two symbols have the same name, sorted by exponent instead. Everything in ascending order.

Divs Any formula that is a Div construct. Sorted by hash value in ascending order.

Other Anything that does not fit in any other category. Sorted by hash value in ascending order.

Terms Any formula that is a Sub or Add construct. Sorted by hash value in ascending order.

The main use of this sorting rule is to avoid "weird" order of terms or factors after usage of another rule. A typical example would be to sort x · 2 to 2x. Usage of certain rules within other rules is a way of reducing code duplication. However, the sorting could also be applied by the search, which is why it is listed as a rule rather than utility function. It is just rarely done because it has a high rule bias score, see section 5.5.

5.4.2 simplifyTerms

This is the main rule used for simplification operations on terms. It performs both addition and subtraction. It searches through a termlist, and attempts to find two terms that can be added together. It will add all integers and floats together into just one term. The addition operation itself is done through the __add__ function of the numeric types (with PARSE_TO_MOD set to False, so that the simplification actually happens). Subtraction is done implicitly as the signs of the two terms are considered. It will also add terms that are the same. It splits each term into a tuple (coefficient, rest) by use of the as_coeff_mul SymPy function. The coefficient will be 1 unless the term is a Mul with the first argument being a numer- ical. It then looks for terms that have the rest part equal (using SymPy’s equality function), and adds their coefficients (taking the term’s signs into account). This allows simplifyTerms both simplify numerics and collect terms, making it a staple rule that many other rules use.

32 5.4. RULES

5.4.3 simplifyProduct This rule, together with 5.4.4 groupPow, are the multiplication rule equivalents to the two types of simplifications performed by 5.4.2 simplifyTerms. They are also used by many other rules to simplify the result. simplifyProduct searches through all numerics in the factorlist of a Mul and multiplies all integers and floats. The multiplication operation is performed by the __mul__ functions of these numeric classes (with PARSE_TO_MOD set to False).

5.4.4 groupPow x · x ⇒ x2 This rule groups equal factors into powers by searching through the factorlist of a Mul. Each factor can be viewed as a baseexponent. Even if a factor is not a Pow construct, the exponent is assumed to be 1. Each factor is separated into a (base,exp) tuple by the SymPy function as_base_exp. Any factors that have an equal base (determined by SymPy’s equality function) are grouped placing the exponents into an Add construct. To not make the step-by-step solution unnecessary detailed, rules D.1.1 negAddToSub and 5.4.2 simplifyTerms are used on the resulting exponent.

5.4.5 eqRules This section describes a few different rules that work slightly differently than the rest. Specifically, these are rules used for manipulating equations, to apply a certain operation on both sides. They (except for one) take an extra argument, and thus should be called equation rules. The equation rules used by the CAS are: eqAdd(f,a) Adds a to both sides of f. eqSub(f,a) Subtracts a from both sides of f. eqMul(f,a) Multiplies both sides of f with a. eqDiv(f,a) Divides both sides of f by a. eqPow(f,a) Takes both sides of f to the power of a. eqSqrt(f ) Takes the square root of both sides of f. eqNroot(f,a) Takes the a:th root of both sides of f.

eqAdd(f,a) puts both sides of f in an Add construct together with a, and then uses rules 5.4.2 simplifyTerms and 5.4.1 sortBoth on both. This allows for human-like movement of terms across the equal sign. eqSub(f,a) works in a similar way, but it creates Sub constructs instead.

33 CHAPTER 5. METHOD

eqMul(f,a) will distribute the multiplication over terms if a side of f is Add or Sub. Every term will be placed in a Mul construct together with a. If the term is a division construct, will attempt to shorten factors. Otherwise, it will also attempt to use rules D.1.12 mulRemove0, D.1.13 mulRemove1, 5.4.3 simplifyProduct, 5.4.4 groupPow and 5.4.1 sortBoth on every Mul. eqDiv(f,a) works in a similar way to eqMul(f,a), but creates Div con- structs instead. It will also distribute over terms and try and shorten factors. eqPow(f,a) puts both sides of f in a Pow construct with a being the expo- nent. It then attempts to use rules D.1.37 powPow, D.1.38 powRoot and D.1.33 simplifyPow on the results. eqSqrt(f) puts both sides of f in a Sqrt construct and attempts to ap- ply rules D.1.42 simplifyRoot and D.1.48 rootPow on the results. eqNroot(f,a) works similarly but creates Nroot constructs instead with a being the extra argu- ment. The main reason for all the extra applications of other rules is to not make solutions too detailed for the high school students. It also makes the application of these rules seem streamlined and look similar to how a human would manipulate equations.

5.5 Search

This section describes how algorithms were used to decide which rules to apply and in what order to generate a step-by-step solution.

5.5.1 Redefining the problem as a graph search The biggest issue with the rule selection problem definition given in section 5.2 is that it does not lead to any practical algorithms. The only information it makes use of is the current formula and the goal, and fails to use other available information like previously taken steps. It also does not plan ahead. The rule that gives the formula closest to the goal is not necessary the best rule to actually apply in the situation. It is the typical problem of local minima. A more practical approach is to define the problem as a graph search. Let the graph be defined as G = (F,E). F is the set of all possible mathematical expressions, which means that a node in G is equivalent to a mathematical expression, or in this case a SymPy formula. Two nodes a, b ∈ F are the same node if and only if a = b where the equality test is a check that they are identical (SymPy’s equality function is used in practice). A directed edge e ∈ E from a to b, a, b ∈ F , exists if and only if there is a rule r ∈ R, where R is the set of rules or in short ruleset, for which r(a) = b. The equality test here is a check that the formulas are identical (SymPy’s equality function is used in practice). The exact specific problem definition depends on the problem type. However, they all revolve around finding a path from a given start node to some other node. The goal node can be explicitly given like in the showEquality problem type, or be

34 5.5. SEARCH a vague description like "the simplest node". This is then a search problem, as the algorithms have to search the graph to find that path. Graph search is a common problem in areas like AI, and as such there are many algorithms for it. In this thesis, variations of the classic A* algorithm were used.

5.5.2 Implicitly generating the graph The graph G is never explicitly stored anywhere. There is thus no convenient list of nodes and edges. Instead, G has to be generated as the search algorithm traverses it and explores more nodes. A graph search algorithm that has full knowledge of G would look at the neighbor list or neighbor matrix (depending on how G is stored) to get all the edges of a node. That operation has to be replaced with a function getListofRules that given a formula f ∈ F gives a list of all rules that can be applied on f. The implementation of getListofRules iterates through all the rules in the ruleset and attempts to apply it on the given formula. If the rule can be applied, it is added to the list of rules that is later returned. This is very extendable - whenever a new rule is added to the ruleset, it is automatically compatible with getListofRules. However, as the amount of rules grow, it can easily become inefficient to attempt to apply every single rule. Currently, an optimization is used that only attempts to apply rules that are applied on constructs that the formula contains, and it is fine for the current small ruleset. As the ruleset grows (which it will need to in a full CAS) other solutions will be needed. A special case in the getListofRules are the equation rules. The second argument to the equation rules can be almost any formula, and because of that there are technically infinite possible ways to apply them. That is not practical to work with. Instead, an algorithm is used that decides which equation rules with which arguments are feasible to try. The basic premise is to only try rules and arguments that perform some simplification on either side. The algorithm looks through both sides of the equation. If a side is a Pow construct with exponent n, then it adds eqNroot(f,n) to the list of applicable rules (unless n is 2, then eqSqrt is added instead). Similarly, if either side is a Nroot or Sqrt, eqPow can be applied to both sides with the corresponding exponent argument. For eqAdd and eqSub, it iterates through the terms of both sides, and adds rule eqAdd(f,term.content) if the sign is nega- tive and eqSub(f,term.content) if the sign is positive. For each term it also adds a eqDiv(f,term.content) rule. If the content is a Mul construct, it also adds a rule eqDiv(f,factor) for each factor in the factorlist. For any division construct in the factorlist, or if the content of a term itself is a division construct, an eqMul rule is added for each factor in the denominator factorlist as well as the whole denominator. The same is done with eqDiv for the numerator. Here is an example, the list of applicable equation rules on the formula a c = b d

35 CHAPTER 5. METHOD

are a eqSub with argument b c eqSub with argument d eqMul with argument b eqMul with argument d eqDiv with argument a eqDiv with argument c

To actually follow an edge in the graph to the neighbor node, the rule has to be applied on the formula. As stated in section 5.4, a rule function does not apply itself recursively on the whole formula, only on the root. A separate function that attempts to apply a rule recursively on the whole formula tree has to be used. It performs a post-order traversal of the formula tree, and attempts to apply the rule at every node. A successful application alters the tree on that node. This way, a rule can actually be applied at multiple places in a formula. This is similar to how Mathleaks do their solutions, and how it works in the system presented by Jurkovic in 1987 [13]. One more optimization that is done is that getListofRules actually returns both the rule and the result of applying the rule, i.e. the neighbor node. This avoids having to apply a rule multiple times.

5.5.3 Simplify The simplify problem type does not inherently have a clear goal, so one has to be defined so that the search can be guided. The chosen solution is to define a score function s : formula −→ float that gives a score to a formula depending on its complexity. A simpler formula would get a lower score. The score formula is commonly known as the heuristic function. Thus the simplify problem type can be defined as "find a path from the start node to the node with the lowest score". The score function looks at three aspects of a formula. First is the size score, which iterates through the formula tree and adds a score depending on which con- struct is in that node. The second part of the score function is the atomRepeats that looks at the atoms of the formula and if similar atoms are found in multiple places, it gives penalty score. The last part of the score function is the structural part and looks for certain patterns in the tree that can be simplified, like division con- structs within division constructs, and gives extra penalty score for such patterns. The score by each all three parts are weighted and added for the final score. This score function uses 46 different , whose values were assigned empirically. Initial values were chosen, and then modified during the development process.

36 5.5. SEARCH

As mentioned in section 5.5.1, the search itself is performed by a variation of the A* algorithm. Every visited node is placed in a priority queue (the python module heapq [31] is used for this), which is sorted by the sort score of the corresponding formula. The sort score is a weighted sum (the weights are given as parameters to the search function) of the formula score, the depth score and the rule bias score. The formula score is the score given by score function. The depth score is given by the distance from the start node to the current node. The rule bias score is a score given to each rule, to make the search prioritize applying certain rules. How does the algorithm know when to stop searching? A node with a very low formula score may have been found, but there is no way of knowing if there is not another formula with an even lower formula score without exhausting the whole search space. This algorithm instead searches for a certain time period. It continuously keeps track of the node with the lowest formula score, and when the timer runs out it assumes that is the answer and returns the path to that node. The time limit is given as a parameter to the search function. The search needs to be able to backtrack from the answer node to the start node to construct a step-by-step solution. To be able to do this it needs to store the path taken to every node, as well as which rules were applied to get that path. That information is kept in the Visited set. Normally, Visited would be used to keep track of already visited nodes so that they are not searched multiple times. It is used for that purpose too, but for each visited node it also stores the depth, the rule that was used to get that node, and the previous node, such that rule(previous) = current node. In the implementation Visited is a Python dictionary, and given any visited node the path taken can be traced through that information. Formulas are not inherently kept on canonical form, so it has to be done in the search algorithm. Whenever a new node is reached through the application of a rule, two functions are being used on it, which together can be considered being the canonical function. The first function is Standardize, which performs a post- order traversal of the formula tree, and replaces all Rational constructs with Div constructs, negative numerics with Neg constructs (i.e. Neg()) and flattens Mul constructs. This is done to have less different types of constructs, which made implementation of many rules simpler because there are less different types of input to consider. The second function being applied is the sortBoth function, to make sure the intermediate steps in the solution look as human-like as possible. The final difference to a standard A* is an aspect that can improve an already found solution. The premise is that this algorithm does not necessary always return the shortest path ( = least amount of rule applications) to another node. This was a feature requested by Mathleaks. A situation where this can arise is application of rule A, then B and then A again, when a shorter solution could have been to apply rule B first and then A, so that A could be applied at multiple places in the formula in just one step, thus leading to a shorter solution. This was achieved by looking at the depth information in the Visited dictionary whenever an already visited node was reached by the search. If the new depth is less than the previous depth, the

37 CHAPTER 5. METHOD formula would be put into the priority queue again, and the information in Visited would be updated. A problem with this approach is that it leads to a part of the search space being searched again, but in practice it never proved to be a big issue. Some brief pseudocode for the search algorithm can be found in appendix C.

5.5.4 ShowEquality

In difference to Simplify, ShowEquality has a clear end goal. This is a straight up search problem, to find the path from one known node to another known node, and is a situation in which A* shines. The algorithm is similar to the Simplify A*, but it does not feature a timer, it instead stops as soon as the goal node has been found. This also means it does not feature the aspect of the algorithm that improves solutions. The only big difference to the Simplify algorithm is the heuristic function. Here, it does not matter how simple a formula is, instead the heuristic function should measure how similar two formulas are. A good way to compare the of two formulas is to compare the topologies of the formula trees. The advantage of this approach is that this is a problem in other areas as well, and algorithms exist to solve it. Zhang and Shasha [32] describe a few algorithms for tree comparisons. The first algorithm from the paper, the "new simple algorithm", had little overhead and was relatively simple and thus was implemented. It is a dynamic programming algorithm that calculates the edit distance between the two labeled trees. The edit distance is the amount of insertion, deletion and changing operations needed to be done to convert one tree to the other. Insertion is adding a new node at any place in the tree, deletion is removing a node and changing is changing a node’s label to something else. A label of a node is in this case the SymPy class of the formula in that node, and if the node is an atom the atom’s value is taken into account as well. The algorithm will not be explained here, instead see [32] for the precise definition. The implementation was tested on the example trees given in [32], and found to be correct. The scoring function was very simple - if the labels are equal, the score is 0, and 1 otherwise. It could have been made more complex, but there is little reason to do so. The edit operations perform transformations on a tree that do not correspond to any rules. A rule cannot change an Add to a Div, or suddenly delete an atom node. However, to use rules as a basis for the edit distance would not be feasible. The point of the heuristic function is to be an approximation to guide the search, and it should be fast to compute. This algorithm achieves both those points, and empirically it seemed to work well.

5.5.5 Solve

Initially, for simplicity, the problem of solving an equation for x was defined as rewrite the formula to x = s where s is any formula that does not have an x, and then simplify s. This way, solving an equation would not require any new algorithm,

38 5.6. INTERFACE as two already implemented ones could be used. The issue with this approach was that it could potentially lead to s being very complex, as the algorithm would not care what it was long as it did not contain any x. Instead, the idea was to focus on separating x while keeping the formula as simple as possible, all in one algorithm. The same A* algorithm as in Simplify is used for Solve. The only difference is the heuristic function. It takes the extra argument x, which is the variable to solve for. x does not necessarily have to be a symbol, it could be any formula. It counts the number of occurrences of x in the given formula, and gives extra penalty score if there are more than one. It also uses the size score and structural score functions from the Simplify heuristic function on both sides of the equation, and weights them differently depending on the number of occurrences of x, and on which side the xs are. The goal is to make the search algorithm first lower the amount of occurrences of x, then get the x alone on one side and then simplify the other side, while also preferring keeping formulas simple while doing so. A weakness of this heuristic function is that if a solution would lead to a very complicated answer, it might give it such a high score that a formula that is not an actual solution would get the lowest score. In practice, on all problems this algorithm was tested on, this was never a problem as the answers were typically very simple. However, if it became an issue, a more complex search algorithm would be needed.

5.6 Interface

As the CAS would not be used directly by students, not much time was spent on building an interface. The search algorithms are instead wrapped in a python file that acts as a command line application. It can be called with different arguments to solve an equation for a specific problem type. To parse the input equation, it uses sympify with PARSE_TO_MOD set to False. It then takes the output solution from the search, and prints it out. The type of output depends on the input arguments. It can print using the ModPrinter or ModLatexPrinter commands to either the terminal or a specified file. If it prints LaTeX to file there is also an option to use pdflatex to generate an output pdf of the solution. This option was used to create the solutions for the evaluation.

39

Chapter 6

Results

This section describes the study performed to evaluate the pedagogical value of the CAS and presents the results. Section 6.1 gives a detailed explanation of why a questionnaire was used and how it was crafted. Section 6.2 then presents the results with graphs and associated explanations in text, with subsections that present the results divided in different ways (by question, by course and by year). As question 6 is a bit different (rather than being about evaluation it is about differentiating between solutions made by humans and solutions made by computers) the results for it are presented in its own subsection as well.

6.1 Evaluation

To get an estimate of how good this method was, the CAS needs to be evaluated. Since the primary goal of the CAS is to generate pedagogical step-by-step solutions, that is what should be tested. An idea was to use some objective scoring method and compare how similar the solutions were to Mathleak’s solutions. However, that would in a way completely disregard the demographic which is high school students. After all, the end goal is not to replicate other solutions, but to create pedagogical solutions. What then is a pedagogical solution? Different students might prefer different solutions. It is quite a subjective matter. As such, the best method of evaluating the CAS would be to do so subjectively, and perform a study on the demographic. The study was a questionnaire distributed to high school students. A Swedish Gymnasium (the equivalent of high school) was chosen, and the questionnaire was distributed to three different classes during beginning of math class. To encourage more students answering, every student who participated got a free 3-month license to Mathleaks’ service. The allotted time for each questionnaire session was 15 minutes, so it had to be short. A type of questions that was considered was to have the students grade solu- tions generated by the CAS on a scale, but it was dropped in favor of comparisons with solutions from other sources to the same problems. That would give more

41 CHAPTER 6. RESULTS interesting data, as it may be hard to grade the pedagogical level of a solution, but easy to pick which solution the student prefers. The only instruction the student got was to "pick whichever you think is best". Asking which solution is the most pedagogical could be confusing. The students were not informed which solutions were generated by what source. To evaluate all of the CAS’s capabilities, the problems would be of different categories. There would be one problem of solving a first-degree equation, one problem of simplifying an expression, one problem of "show that x = y", and two second-degree equations, one of which would have imaginary roots. There would also be a 6th question, which would be a Turing-test style question. The student would get two different solutions to a problem and have to guess which was generated by a human and which by the CAS. For all questions, the problems would be taken from high school math text books. Books 1c [33],2c [34] and 3c [35] of the Swedish text book series "Matematik 5000" were used for this1. The text book series Matematik 5000 were used specifically because the publisher of those books has published solutions to the books’ problems on their webpage, and it would be interesting to compare with them. The series c were chosen specifically because that is the version the students in that Gymnasium were studying. It is not stated anywhere, but it is plausible that the publisher’s solutions were also made by humans, perhaps by the authors. The text books contained plenty of problems that could be used in the ques- tionnaire. To make the selection as unbiased as possible, for every category a set of problems was created, and then one problem was chosen at random for each cat- egory. The sets were created by taking all possible non-text problems from those books that the CAS could solve. In this case, "could solve" means that the answer was correct, or near correct (i.e. for a simplification problem it isn’t always clear which answer is the correct one). Certain problems that the CAS could solve but where the solution was too long and complicated were dismissed, because the point was to evaluate the solving method and not the ruleset (the complicated solutions were typically a consequence of the CAS lacking knowledge of a certain rule). For each problem in the questionnaire, there are five different solutions. The first is just the answer, as given in the book. The second would be the solutions published online by the books’ publisher. The third would be a solution created by hand by staff at Mathleaks. The fourth would be the solution generated by the CAS. Finally, the fifth would be a solution generated by Wolfram Alpha. It was chosen because it is a very widely known engine, and it is plausible that a student could potentially use it. To make the comparison as fair as possible, only the solution step logic would

1In Sweden, high school math has 5 levels, of which 1 is the lowest and 5 the high- est. For levels 1 to 3, there are also different variants a,b and c, and which a student would study depends on his or her program. The series c are for those studying the Teknik and Natur programs. The complete description is available on the homepage of Skolverket, or in English here http://www.skolverket.se/polopoly_fs/1.174554!/Menu/article/ attachment/Mathematics.pdf [36]

42 6.2. PRESENTATION OF THE RESULTS

Figure 6.1. The solution to the first problem in the questionnaire that was generated by the CAS. The aesthetics are bland to only compare the solving logic of the different solutions. be evaluated. A solution created by a human could potentially contain other fea- tures than just formulas and justifications. Any extra aesthetics were removed. The solutions of the publisher did not actually feature any justifications and were in- stead written as just formula after formula, which is reflected in the questionnaire. Wolfram Alpha generated not only formulas and justifications but also other hints. Those hints were removed with the "hide hints" button. Wolfram Alpha’s solutions were also in English and had to be translated to Swedish. The solutions created by Wolfram Alpha were manually written in LaTeX to resemble the original solution as close as possible. All LaTeX solutions were then converted into pdf format by pdflatex. The LaTeX packages used to create the look are internal packages used at Mathleaks. The full questionnaire with the chosen problems and questions can be seen in Appendix A.

6.2 Presentation of the results

In total, 63 students answered the questionnaire. However, in seven of the ques- tionnaires, one or more questions were left unanswered. These questionnaires were dismissed as it is plausible that the students were not paying attention and thus the answers were not credible. So in the end, 56 questionnaires were used for the results. The 56 students were split over three classes, and were studying first and second year of the high school. This study was done in the end of May, so it was towards the end of the year. The level of their math knowledge varied from studying

43 CHAPTER 6. RESULTS the courses Matematik 1c, Matematik 2c and Matematik 3c. In the graphs, Ans is the solution which is just the answer. Ma5000 are the publisher’s own solutions. Hand are the solutions made by hand by Mathleaks. Solver are the solutions produced by the CAS in this thesis. WA are the solutions produced by Wolfram Alpha. For the raw questionnaire answers, see Appendix B.

6.2.1 Result totals for questions 1 to 5

This section shows the total answers by all students to each question. The charts give an overview of which solution was the most popular to each question.

Figure 6.2. The full total answers for questions 1-5.

The graph in figure 6.2 shows the of all the answers in favor of the different sources of solutions. It is the most interesting one to look at to get a quick overview of the students’ solution preferences. It shows that the publisher’s solutions were equally good to the handmade by Mathleaks (the difference is within the margin of error), and were better than both the CAS’s and Wolfram Alpha’s solutions. The CAS’s is slightly ahead of Wolfram Alpha’s solutions but not signif- icantly. Here follows a breakdown of the results of the individual questions 1 to 5.

44 6.2. PRESENTATION OF THE RESULTS

Figure 6.3. The total results for question 1.

Figure 6.3 shows that in the first question the solutions that were generated by a computer were not very popular. 84 precent of the answers preferred solutions made by humans. The math problem was a rather simple equation solving problem of the first degree.

Figure 6.4. The total results for question 2.

Figure 6.4 shows that the students’ preferences were somewhat equal between Mathleaks’ solutions, the CAS’s solution and Wolfram Alpha’s solution in question

45 CHAPTER 6. RESULTS

2. In contrast to question one, the publisher’s solution was not very popular at all. Question 2 was a complicated simplification problem involving division, powers and negative powers. The publisher’s solution was likely too brief to be pedagogical.

Figure 6.5. The total results for question 3.

Figure 6.5 shows that similarly to question 2 the answers in question 3 were quite evenly spread out, with the difference being that the publisher’s solution was more popular and Wolfram Alpha’s solution was less popular. It was a quite simple "show that left side equals right side" question with one root.

46 6.2. PRESENTATION OF THE RESULTS

Figure 6.6. The total results for question 4.

Question 4 was an equation of the second degree that first required using dis- tributive multiplication and then could be solved with the pq-formula. While the solver’s solution and Mathleaks’ solution showed every step and were quite similar, the publisher’s solution skipped all the tedious steps of simplifying the expression after having applied the pq-formula. Figure 6.6 shows that the publisher’s solution was preferred by nearly half the students. Many students probably saw the short and concise solution next to the very long and elaborate ones and chose it.

47 CHAPTER 6. RESULTS

Figure 6.7. The total results for question 5.

Question 5 is similar to question 4 but only requires moving one term and a division before being able to apply the pq-formula. The difference is that the solutions give imaginary roots. Figure 6.7 shows that the preferences of the students were somewhat evenly spread out between all solutions (except the trivial one). By looking at the results for individual questions, it varied greatly from question to question. The biggest variation is in the publisher’s solutions - in question 1 and 4 it got nearly 50%, but for question 2 it only had 9%. This shows that different math problems may require different solutions, and that no solution type is superior over all others.

6.2.2 Result totals for questions 1 to 5 by different math courses

This section presents the total results in questions 1 to 5, but split by course rather than by question. Figures 6.8, 6.9 and 6.10 show that there is no big disparity between the different courses, except that students in course 3c had a more prevalent preference for Mathleaks’ solutions.

48 6.2. PRESENTATION OF THE RESULTS

Figure 6.8. The full total answers for questions 1-5 by students studying math class 1c.

Figure 6.9. The full total answers for questions 1-5 by students studying math class 2c.

49 CHAPTER 6. RESULTS

Figure 6.10. The full total answers for questions 1-5 by students studying math class 3c.

6.2.3 Result totals for questions 1 to 5 by different high school years

This section presents the total results in questions 1 to 5, but split by the year the student is currently studying. Figures 6.11 and 6.12 show that there is no significant difference in solution preference between students in the first year of high school versus the second year.

50 6.2. PRESENTATION OF THE RESULTS

Figure 6.11. The full total answers for questions 1-5 by students in first year of high school.

Figure 6.12. The full total answers for questions 1-5 by students in second year of high school.

6.2.4 Results for turing-type question 6

In question 6 students were given a hard simplification problem with two solutions, one made by Mathleaks and the other by the CAS. The question asked which

51 CHAPTER 6. RESULTS solution they thought was made by a computer and which by a human. The results were surprising. Despite preferring the solutions made by Mathleaks over those made by the CAS in the blind tests, students could not differ the Mathleaks solution from the CAS solution. Of course, certain individual students could have reasoned themselves to the correct answer, but as a total it seems that most just guessed.

Figure 6.13. The full total answers for question 6.

52 Chapter 7

Discussion

This chapter contains a discussion of the results presented in chapter 6. Section 7.1 is a general discussion of results and solutions. Section 7.2 discusses the parts of a solution which were not evaluated, specifically the aesthetics. Section 7.3 is a meta-discussion of the validity of the student answers and of this study in general. Section 7.4 discusses potential ways of expanding on the work described in this thesis. Section 7.5 gives some final concluding statements.

7.1 Discussion of results

Looking at the results, the first thing that comes to mind is that handmade so- lutions were better than computer-generated solutions. The first question to ask is why certain solutions were preferred. The publisher’s solutions do not actually have any justifications, yet they were popular. That can be explained with them being very condensed, and if the students already mastered certain algebraic skills those solutions give better overviews of how to solve the problem. For Mathleak’s solutions, it seems that the solution steps are simply "better" than the computers’. The programs are just not smart enough (yet?) to create solutions that the students prefer. Could solutions made by a potential future CAS ever surpass a carefully crafted human one? A solution does not only have to be a series of calculations like in this evaluation, but could also include flowing text, added diagrams and figures and other pedagogical tools. An AI would have to be exceptionally sophisticated to utilize all those tools in the same pedagogical manner as a human could do. Frankly, technology is not there yet. While a CAS may not be better than a human at creating pedagogical solutions, it might be "good enough". A third of the total answers preferred solutions made by computers. Specifically, the solutions of the CAS presented in this paper were chosen in a fifth of the total questionnaire answers. That is a significant portion. It means that the computers’ solutions are good, and that they can be competitive. Considering the main advantages of using a CAS to generate solutions, most notably

53 CHAPTER 7. DISCUSSION the speed, utilizing a CAS can save a lot of manual work. The compromise is less pedagogical solutions, but if they are "pedagogical enough" it may be a compromise worth taking. How does one measure pedagogy of a solution? This evaluation only compared solutions from different sources, and did not place them on some absolute scale. Such a scale would be great for development of pedagogical CASs but it is not obvious how to construct one. It would have to be at least partly based on subjective student opinions of the solutions, student grades and similar factors. But as is clear from the results, different students prefer different solutions. It should therefore be impossible to construct a perfect absolute scale that holds true for every student. Personalizing solutions through psychology and/or to maximize pedagogy could be the best option, but it further complicates building a pedagogical CAS, which is already a complicated task. Then there is question 6. It was the Turing test style problem, where the students had to determine which solution was made by the CAS and which by a human. The results suggest that they cannot distinguish the two solutions. On a quick glance, the solutions are different but both seem valid and there is nothing that immediately stands out as a hint of which solution is made by the CAS. It is possible that a high school student without experience with CASs or little interest in mathematics to not being able to tell which solution is made by the CAS, and that a study with a much larger number of answers would show a similar result.

7.2 Solution aesthetics and fairness

The solutions made by Mathleaks also typically feature blocks of explaining text, which were removed for this test. The solutions made by Wolfram Alpha also had extra blocks of text called hints that were removed. They also had a completely different aesthetic and were in English. All that had to be stripped off. The Ma5000 solutions were even more condensed, and had to be reformatted to be more similar to the other solutions. What all this actually means is that the solutions presented for the students, were not actually the exactly same solutions as in the sources. However, it made the solutions look aesthetically equal so that the amount of test variables was limited. It can be discussed which test is actually more fair - to limit testing to just the solutions, or to also include aesthetics in the test. Both have their merits. It could also be argued that the aesthetics are an integral part of the solution, and as such it should be included in the test. Either way, this study was too small to perform both tests, but it would be highly interesting to see results of the same survey, but with the aesthetics of the solutions kept. The aesthetics of the solutions presented in the questionnaire are based on the aesthetics of the solutions in Mathleaks’ service. As such, it could be questioned how fair it is to convert the other solutions to that, while Mathleak’s solutions are already supposed to look similarly to that. However, the format used in the test

54 7.3. DISCUSSION OF THE QUESTIONNAIRE ANSWERS seems bland and neutral enough that it is not probable that it had any significant impact on the results.

7.3 Discussion of the questionnaire answers

Even though the students were spread out over different years, courses and pro- grams, they all were from the Te (Teknik) and Na (Natur) programs in one Swedish gymnasium. The differences between math courses c and for example b that are studied by social science programs are not that large. The results could potentially be similar for other students as well, but it is hard to tell without performing larger studies. Note that the study got quantitative answers, and not qualitative. It is a simple way to get data, but it does not unambiguously answer the question of why stu- dents preferred certain solutions over others. That has to be guessed and reasoned about. Much valuable information could be gained from deep interviews with the demographic. A variable that would be interesting to analyze would be the students’ grades, versus their answers. Would a struggling student prefer the more detailed solutions over the brief ones? Out of privacy and decency the students’ grades were not asked for, but it would be an interesting hypothesis to test in another study. Another question is how serious the students were with answering the questions. If some did not even answer all questions, it can be questioned of how many students just randomly chose answers. There was not much time for students to fully examine every detailed step in every solution and make an accurate assessment of which one was best. That is probably not a big issue though, as a good pedagogical solution should be easily understandable by the student, and first impressions are important too. In the end, all quantitative studies will have non-serious answers, it is unavoidable. If every study with potential unrepresentative answers was dismissed, there would be no studies left. A final note is that the solution to the problem in question 5 used the concept of complex numbers (i.e. the polynomial had complex roots). In the c math courses, students only learn about complex numbers in course 2c. As such, The students who were currently studying 1c did not have necessarily have full understanding of the solutions. The impact on the final result is most likely minor though.

7.4 Further work

On the technical side, the CAS could be improved by adding more rules to allow it to solve more problems. The dependence on various SymPy functions like factor and gcd could be removed, to allow more flexibility. Trying different search algorithms could lead to better solutions, as for example αβ-pruning is commonly used in searches where the goal is to minimize the score function. To improve the heuristics, various machine learning methods could be used, which would look at existing

55 CHAPTER 7. DISCUSSION solutions and learn heuristic parameters that way. There are many ways to continue develop the CAS. On the pedagogical side, more extensive studies could be performed to get more data. There is little point in doing it on this limited CAS, but it should be done on a more developed CAS. Performing interviews to get more qualitative data can give more insight to why students prefer certain solutions over others. Solutions could also be compared with other sources, for example with the Algebrator .

7.5 Conclusion

In this thesis we have described a system that is capable of solving algebraic equa- tions on a high school level and also produce pedagogical intermediate steps in the solution with explanations. The system’s purpose is educational. It can aid stu- dents in understanding how certain problems can be solved, and automate tedious problem and solution generation for text book authors, among other possible uses. It is built upon the computer algebra system SymPy, and uses an approach based on mathematical rules. It uses graph search algorithms and heuristics to select which rules to apply to construct the solution. A study was conducted on a selected group of high school students to evaluate an implementation of the described system. The study showed that even a relatively small prototype of a step-by-step CAS can produce solutions that are somewhat competent, since its solutions were preferred in 20% of the total answers. This method is clearly not a bad approach, in particular given the fact that the alternative might be a solution that only contains the final answer. With more rules and improved heuristics it would probably become more com- petent. It can be further developed in a number of areas. First, more rules can be added so that a larger variety of problems can be solved. Second, the heuristics can be improved with for example machine learning methods. Third, other and more advanced search algorithms can be tried, which could potentially improve the solution logic. Finally, on a longer time scale, eventually the CAS could incorporate figures and graphs into the solutions. On top of all this more extensive evaluations have to be done. However, it will be a long while until a computer can completely overtake humans in solving problems from math text books. The main reason are text questions. To solve such a problem a CAS would need to use natural language processing to read and understand the problem, convert it to a math problem and the solve it, all while producing a pedagogical solution that explains in detail about how the problem is solved. The technology is not there yet. Until then, a CAS would serve as a compliment and a tool to help humans produce parts of solutions or solutions only to certain problems.

56 Bibliography

[1] David Eisenbud. Computations in with Macaulay 2, vol- ume 8. Springer Science & Business Media, 2002. [2] David Joyner, Ondřej Čertík, Aaron Meurer, and Brian E. Granger. Open source computer algebra systems: Sympy. ACM Commun. Comput. Algebra, 45(3/4):225–234, January 2012. [3] Glenda Anthony and Margaret Walshaw. Characteristics of effective teaching of mathematics: A view from the west. Journal of Mathematics Education, 2(2):147–164, 2009. [4] Bernhard Kutzler. The algebraic calculator as a pedagogical tool for teaching mathematics. 2000. [5] Bernhard Kutzler. DERIVE – The Future of Teaching Mathematics. RISC Report Series 93-61, Research Institute for Symbolic Computation (RISC), Johannes Kepler University Linz, Schloss Hagenberg, 4232 Hagenberg, Austria, 1993. Published in The International Derive Journal, vol. 1, no. 1, April 1994. [6] J Child, Malgorzata Brothers, and Todd Fortenberry. Step-by-step cas (com- puter algebra system) application using a problem set data structure, Novem- ber 8 2001. US Patent App. 10/035,735. [7] J.F. Nicaud and M. Saidi. Explanation of algebraic reasoning : the aplusix system. In S. Ramani, R. Chandrasekar, and K.S.R. Anjaneyulu, editors, Knowledge Based Computer Systems, volume 444 of Lecture Notes in Computer Science, pages 145–154. Springer Berlin Heidelberg, 1990. [8] Hamid Chaachoua, Jean-François Nicaud, Alain Bronner, and Denis Bouhineau. Aplusix, a learning environment for algebra, actual use and ben- efits. In ICME 10: 10th International Congress on Mathematical Education, July 4-11, 2004, page 8, 2004. [9] Patrick Suppes, Mona Morningstar, et al. Computer-assisted instruction. Sci- ence, 166(3903):343–350, 1969. [10] Eno Tõnisson. Step-by-step solution possibilities in different computer algebra systems. 1999.

57 BIBLIOGRAPHY

[11] M. J. Beeson. Logic and computation in mathpert: An expert system for learning mathematics. In Proceedings of the Third Conference on Computers and Mathematics, pages 202–214, New York, NY, USA, 1989. Springer-Verlag New York, Inc.

[12] Neven Jurkovic. Edusym-educational symbolic manipulator on a microcom- puter. In Proceedings of the fifth ACM symposium on Symbolic and algebraic computation, pages 154–156. ACM, 1986.

[13] Neven Jurkovic. An intelligent tutor for high-school algebra. In Proceedings of the 15th annual conference on Computer Science, pages 27–31. ACM, 1987.

[14] Neven Jurkovic. An expert system for teaching pre-college algebra. In Pro- ceedings of the 17th IASTED International Conference on AI, pages 327–329, 1999.

[15] Robert Mařík. Mathematical assistant on web. http://um.mendelu.cz/ maw-html/menu.php. Last visited on 29/5/2015.

[16] Microsoft. Microsoft mathematics 4.0. https://www.microsoft.com/ en-us/education/educators/default.aspx#fbid=wm9kC-eTK18. Last visited on 29/5/2015.

[17] David Scherfgen. Derivative calculator. http://www. derivative-calculator.net/. Last visited on 29/5/2015.

[18] Wolfram Research. Computational knowledge engine. https://www. wolframalpha.com/. Last visited on 29/5/2015.

[19] Keith O. Geddes, Stephen R. Czapor, and George Labahn. Algorithms for Computer Algebra. Kluwer Academic Publishers, 2nd edition.

[20] Joachim von zur Gathen and Jürgen Gerhard. Modern Computer Algebra. Cambridge University Press, 2nd edition.

[21] D Ginsburg, B Groose, J Taylor, and B Vernescu. The history of the calcu- lus and the development of computer algebra systems, worcester polytechnic institute junior-year project, 1998.

[22] IEEE Computer Society. 754-2008 - ieee standard for floating-point arithmetic. 2008.

[23] D. Lazard. Gröbner bases, gaussian elimination and resolution of systems of algebraic equations. In J.A. van Hulzen, editor, Computer Algebra, volume 162 of Lecture Notes in Computer Science, pages 146–156. Springer Berlin Heidelberg, 1983.

58 BIBLIOGRAPHY

[24] H.M. Möller and B. Buchberger. The construction of multivariate polynomials with preassigned zeros. In Jacques Calmet, editor, Computer Algebra, vol- ume 144 of Lecture Notes in Computer Science, pages 24–31. Springer Berlin Heidelberg, 1982.

[25] Elwyn R Berlekamp. Factoring polynomials over finite fields. Bell System Technical Journal, 46(8):1853–1859, 1967.

[26] David G Cantor and Hans Zassenhaus. A new algorithm for factoring polyno- mials over finite fields. Mathematics of Computation, pages 587–592, 1981.

[27] Alan M Turing. Rounding-off errors in matrix processes. The Quarterly Journal of Mechanics and , 1(1):287–308, 1948.

[28] George C. Nakos, Peter R. Turner, and Robert M. Williams. Fraction-free algorithms for linear and polynomial equations. SIGSAM Bull., 31(3):11–19, September 1997.

[29] Robert H Risch. The problem of integration in finite terms. Transactions of the American Mathematical Society, pages 167–189, 1969.

[30] Robert H Risch et al. The solution of the problem of integration in finite terms. Bull. Amer. Math. Soc, 76(3):605–608, 1970.

[31] Python Software Foundation. heapq - heap queue algorithm. https://docs. python.org/2/library/heapq.html. Last visited on 15/6/2015.

[32] Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, 18(6):1245–1262, 1989.

[33] L. Alfredsson, K. Bråting, P. Erixon, and H. Heikne. Matematik 5000 Kurs 1c Blå lärobok. Natur & Kultur.

[34] L. Alfredsson, K. Bråting, P. Erixon, and H. Heikne. Matematik 5000 Kurs 2c Blå lärobok. Natur & Kultur.

[35] L. Alfredsson, K. Bråting, P. Erixon, and H. Heikne. Matematik 5000 Kurs 3c Blå lärobok. Natur & Kultur.

[36] Skolverket. Mathematics. http://www.skolverket.se/polopoly_fs/ 1.174554!/Menu/article/attachment/Mathematics.pdf. Last vis- ited on 15/6/2015.

59

Appendix A

Questionnaire

Note: An error was made in the creation of the survey, and in the first question, solution number three is actually the solution produced by the CAS and solution number 4 is the one produced by hand by Mathleaks. In question 6, the solution produced by the CAS was the first one.

61

Exjobbsenkät – utvärdering av automatisk solver Vilken årskurs i gymnasiet går du? ______Email: (Behövs endast om du vill få en gratisprenumeration till Mathleaks i 3 månader) Vilken mattekurs går du just nu? ______

1. Till följande matteuppgift föreslås 5 olika lösningar. Vilken tycker du är bäst? Kryssa i en.

Lös ekvationen

2. Till följande matteuppgift föreslås 5 olika lösningar. Vilken tycker du är bäst? Kryssa i en.

Förenkla följande uttryck: ( )

3. Till följande matteuppgift föreslås 5 olika lösningar. Vilken tycker du är bäst? Kryssa i en.

√ Visa hur vänster led kan förenklas till höger led: √( )

4. Till följande matteuppgift föreslås 5 olika lösningar. Vilken tycker du är bäst? Kryssa i en.

Lös ekvationen

5. Till följande matteuppgift föreslås 5 olika lösningar. Vilken tycker du är bäst? Kryssa i en.

Lös ekvationen

6. Följande matteuppgift har lösts på två olika sätt: den ena lösningen är gjord för hand, och den andra är genererad av den automatiska solvern. Vilken lösning tror du är genererad automatiskt av solvern?

Förenkla följande uttryck: ( ) ( )

Tack så mycket för dina svar!

Appendix B

Questionnaire Raw Results

These are the raw results of the questionnaire. The first column is the year of high school the student is studying. The second column is the math course the student is currently studying. Since the study was performed at the end of May, after the national tests, the students can be seen as "just having finished this year and course". The rest six columns are the answers to the questions. The number in the table is the solution the student thought was the best. 1 is just the answer. 2 is the publisher’s solution. 3 is the solution by Mathleaks. 4 is the solution by the CAS. 5 is the solution of Wolfram Alpha. The exception is for question one where an error made it so that a 3 is actually the CAS solution, and a 4 is the Mathleaks solution. In question 6, 1 is a correct answer, and 2 is a wrong answer.

Year Course 1 2 3 4 5 6 2 3c 2 3 2 3 4 2 2 3c 4 3 2 2 2 2 2 3c 4 3 2 2 2 1 2 2c 4 3 2 2 2 1 2 3c 4 3 3 4 4 2 2 3c 2 3 4 2 3 1 2 3c 2 3 2 2 3 1 2 3c 2 3 4 3 2 1 2 3c 2 2 2 2 2 2 2 3c 2 4 2 2 2 1 2 3c 2 5 3 2 3 2 2 3c 5 4 3 2 3 2 2 3c 5 5 2 5 5 2 2 2c 4 4 4 5 5 1 2 3c 4 5 5 4 5 1 2 3c 4 5 5 3 4 1 2 3c 4 4 3 3 3 1 2 3c 4 4 3 3 3 1

69 APPENDIX B. QUESTIONNAIRE RAW RESULTS

2 3c 2 4 5 2 2 1 2 3c 5 4 3 2 3 2 2 3c 4 3 4 2 5 2 2 3c 4 3 3 3 4 2 2 2c 2 3 2 1 2 1 2 2c 2 3 4 2 4 1 2 2c 4 5 3 2 2 1 2 2c 4 5 5 4 4 2 2 2c 2 4 2 4 5 1 2 2c 3 3 4 3 4 2 2 2c 2 4 2 2 2 1 2 2c 4 4 3 5 3 1 2 2c 4 5 4 2 3 2 2 3c 5 5 4 5 3 2 2 2c 3 5 3 2 3 2 2 2c 3 4 2 5 3 2 2 2c 5 1 3 2 2 2 1 1c 2 4 3 2 3 1 1 1c 2 4 3 4 5 2 1 2c 2 4 5 5 4 1 1 1c 4 4 1 4 4 2 1 1c 2 3 4 3 4 1 1 1c 2 3 4 3 4 1 1 1c 4 5 3 3 4 2 1 1c 4 5 4 2 5 2 1 2c 2 5 4 2 2 1 1 2c 2 4 3 3 3 1 1 1c 3 5 4 5 5 2 1 1c 2 4 2 2 3 1 1 1c 2 2 3 2 2 1 1 1c 2 4 5 5 2 2 1 1c 2 3 4 3 4 2 1 1c 2 3 3 3 5 2 1 1c 2 2 2 5 2 2 1 1c 2 2 2 2 2 2 1 2c 4 5 2 3 5 1 1 1c 4 3 4 2 3 1 1 1c 2 2 3 2 2 1

70 Appendix C

Search Algorithm Pseudocode

Here is some brief pseudocode for the search algorithm described in section 5.5.3.

Input: f - input formula hf - formula score function rbf - rule bias function lf - list of rules function time_limit - how long the algorithm should run for f_factor - weight of formula score d_factor - weight of depth score r_factor - weight of rule bias score Output: A list with an odd length, with elements in even indices being formulas and elements in odd indices being applied rules f = standardize(f) initialize Visited and HeapQ best_score = hf(f) best_depth = 0 best_formula = f

# Main loop: while HeapQ is not empty and less than time_limit seconds have passed: currentNode = pop(HeapQ) if currentNode is in visited and the depth is not better: continue while loop rules = lf(currentNode) d_score = currentNode.distance * d_factor for each rule in rules: res = apply_rule(nextNode, rule) if res is in Visited and the depth is not better: continue for each loop

#calculate scores

71 APPENDIX C. SEARCH ALGORITHM PSEUDOCODE f_score = hf(nextNode) sort_score = d_score + f_score*f_factor + rbf(rule)*r_factor

#Check if best score needs updating if (f_score, res.depth) < (best_score, best_distance): best_score = f_score best_formula = res best_depth = res.depth

Update Visited and HeapQ Add res to Visited, with extra information res.depth, rule and currentNode Add res to HeapQ

# Search done, backtrack: sol = [best_formula] current = best_formula while current != f: rule_used, previous_node = Visited[current] sol.append(rule_used) sol.append(previous) current = previous_node return sol

72 Appendix D

Rules

This appendix lists all the rules used in the final evaluation CAS that were not mentioned in section 5.4.

D.1 Rules

D.1.1 negAddToSub a + (−b) ⇒ a − b Input must be Add or Sub. This rule converts it to a termlist, and searches it for term pairs (s,t) for which s is 1 and t is negative. A SymPy formula is considered negative if it is numeric with value less than 0 or if is a Neg construct. It is then converted to a -1, abs(t) pair. This operation is done for every pair in the termlist.

D.1.2 negNeg

−(−a) ⇒ a Eliminates double negatives. The outer construct may be a Sub or Neg, and the inner formula has to be negative for the rule to be applied.

D.1.3 negDivNumer a −a −( ) ⇒ b b Moves the negative sign to the numerator of a Div or Rational. The outer construct may be a Sub or Neg. If possible, also attempts to apply one of the other negation rules from D.1.2 to D.1.9 in the numerator.

D.1.4 negDivDenom a a −( ) ⇒ b −b

73 APPENDIX D. RULES

Moves the negative sign to the denominator of a Div or Rational. The outer construct may be a Sub or Neg. If possible, also attempts to apply one of the other negation rules from D.1.2 to D.1.9 in the denominator.

D.1.5 negTerms −(a − b) ⇒ −a + b Distributes the negative sign over terms by converting them to a termlist and then flip every sign. The outer construct may be a Sub or Neg.

D.1.6 negPlusminusRewrite −(a ± b) ⇒ −a ± −b Moves the negative sign over to both sides of a Plusminus construct. The outer construct may be a Sub or Neg.

D.1.7 negPlusminus −(a ± b) ⇒ −a ± −b Like D.1.6 negPlusminusRewrite but will also attempt to apply one of the other negation rules from D.1.2 to D.1.9 on both arguments of the Plusminus construct.

D.1.8 negMul −(a · b · c) ⇒ −a · b · c Moves the negative sign to the first argument of a Mul construct. The outer construct may be a Sub or Neg. If possible, also attempts to apply one of the other negation rules from D.1.2 to D.1.9 on the resulting formula.

D.1.9 negNegMul −a · b · −c ⇒ a · b · c Finds negative formulas in the arguments of a Mul construct, and eliminates double negatives. If the amount of negative formulas is even, eliminates all, and if the amount is odd, eliminates all but the last one.

D.1.10 termsToCommonDenom b ac b a + ⇒ + c c c The 5.4.2 simplifyTerms rule does not add rationals despite them being numerics. 1 The reason for that is that expressions like 1 + are to be simplified in two steps. 2 The first step is to rewrite the terms on a common denominator (performed by this rule) and the second step is to put the terms on just one denominator (i.e. group them), which is done by rule D.1.11 termsOnSameDenom. These two rules do not always have to follow each other but it was a common occurrence.

74 D.1. RULES

This rule looks at terms and their denominators. If a term is not a Div or Rational, the denominator is set to 1. If there is at least one non-one denominator, the greatest common divisor is checked by SymPy’s gcd function, and each term is extended with the appropriate factor to put the terms on a common denominator.

D.1.11 termsOnSameDenom ac b ac + b + ⇒ c c c This rule rewrites all terms that have a common denominator onto the same denominator. It does not touch terms that do not have an explicit denominator (i.e. it ignores all terms that are not Div or Rational). It also uses the rule simplifyTerms on the numerator after the rewriting. This is done to not perform 3a 4a + 6 7a + 6 too many steps. For example, + would immediately become b b b 3a + 4a + 6 instead of . This particular granularity was chosen as it is similar to b Mathleak’s own solutions.

D.1.12 mulRemove0 0 · x ⇒ 0 This rule searches the factorlist, and if it finds a factor that is zero, it returns a zero.

D.1.13 mulRemove1 1 · x ⇒ x This rule searches the factorlist and removes any factor that is 1.

D.1.14 divRemove0 0 ⇒ 0 x If the numerator in a Div or Rational is 0, returns 0.

D.1.15 divRemove1 x ⇒ x 1 If the denominator in a Div or Rational is 1, returns the numerator.

D.1.16 rewriteMulOnDiv b abd a · · d ⇒ c c This rule puts factors onto "one common fraction bar". It searches through the factorlist and if it finds at least one division construct (Rational or Div), it will put all factors onto the numerator, preserving the order of factors. If more than one

75 APPENDIX D. RULES division construct exists, everything is rewritten to the first one. All factors from the denominators are then moved too. No simplifications are done, this is strictly a rewriting rule. The exception is when a division construct has only a 1 in the numerator. Then that 1 is removed. 1 a · 1 This avoids stupid situations like a · being rewritten to . b b

D.1.17 divMulDiv a b ab · ⇒ c d cd Similarly to rule D.1.16 rewriteMulOnDiv, it rewrites multiple division con- structs multiplied with each other into just one division construct. The main dif- ference is that it does not touch any factors that are not division constructs. This rule also uses rule 5.4.3 simplifyProduct on both the resulting numerator and de- nominator. The main use of this rule is for simplifying multiplications of fractions.

D.1.18 divDivDiv a c ad / ⇒ b d bc Simplifies a division construct being divided by another division construct. It first rewrites onto just one division construct, and then uses rule 5.4.3 simplifyProd- uct on both the resulting numerator and denominator.

D.1.19 divDivMul a a /c ⇒ b bc Simplifies a division construct being divided by a formula that is not a division construct. It first rewrites onto just one division construct, and then uses rule 5.4.3 simplifyProduct on the resulting denominator.

D.1.20 mulDivDiv b ac a/ ⇒ c b Simplifies a formula that is not a division construct being divided by a division construct. It first rewrites onto just one division construct, and then uses rule 5.4.3 simplifyProduct on the resulting numerator.

D.1.21 shortenFactorsSamediv This is the rule used to simplify factors in division constructs. It converts both the numerator and denominator to factorlists. If any of the denominator or numerator are not Mul constructs, it is simply seen as a single factor. It tries to find a pair (n,d), where n is a factor from the numerator and d is a factor from the denominator, where gcd(n,d) is not equal to 1. It then divides both n and d with the gcd result. Any results that are 1 are removed.

76 D.1. RULES

D.1.22 mulInto a(b + c) ⇒ ab + ac There are many possible ways to apply distributive laws. For example in the expression (a + b)(c + d)(e + f) any of the factors can be multiplied into any other. There is also the question of granularity. (a + b)(c + d) can be rewritten either to (a+b)·c+(a+b)·d or fully expanded to ac+ad+bc+bd. And sometimes terms can be simplified - it is natural that (x + 1)(x + 4) would be rewritten immediately to x2 +5x+4 in just one step. Ideally, a step-by-step CAS should be able to apply any of these rules, and also use a granularity level appropriate for the current level of the students. Having many different rules makes the decision process of which rule to apply more complicated, and as the CAS did not have to adapt the granularity two variations were chosen: mulInto and D.1.23 mulTermsFull. A term factor is a factor that is an Add or Syb, and a non-term factor is a factor that is neither of them. The rule mulInto searches the factorlist for non-term factors and distributes them over the first factor which is Add or Sub. Specifically, it makes a termlist and puts the content of each term in a Mul construct, along with all the factors being multiplied in. It then uses rules D.1.13 mulRemove1, 5.4.3 simplifyProduct, 5.4.4 groupPow and 5.4.1 sortBoth on each new Mul.

D.1.23 mulTermsFull (a + b)(c + d) ⇒ ac + ad + bc + bd This rule fully expands all multiplications of term factors. Similarly to D.1.22 mulInto, it multiplies everything into a term factor. The difference is that it also multiplies other term factors into it. It does this recursively until none of the resulting terms have a term factor, i.e. it is fully expanded. It will then use rules D.1.13 mulRemove1, 5.4.3 simplifyProduct, 5.4.4 groupPow and 5.4.1 sortBoth on each new term. Finally, it will also use rule 5.4.2 simplifyTerms on the resulting terms. Here is an example: >>> mp = ModPrinter() >>> mp.doprint(a) ’(x^2 + 4*x)*(3*x - 4)*(2*x^2 - 1)’ >>> b = mul_terms_full(a) >>> mp.doprint(b) ’6*x^5 - 35*x^3 + 16*x^4 - 8*x^2 + 16*x’

D.1.24 distrQuadPosNeg (x + a)(x − a) ⇒ x2 − a2 This is a reverse application of the "difference of two squares" law. What this rule actually does is to check that the input is on the correct format, and then just uses D.1.23 mulTermsFull to perform the expansion. The reason this rule exists is to allow the CAS to produce a more accurate justification than the generic D.1.23 mulTermsFull (see section 5.5).

77 APPENDIX D. RULES

D.1.25 distrQuadPosPos (x + a)2 ⇒ x2 + 2ax + a2 This rule checks that the input is an Add construct with two non-term argu- ments, which is in the base of a Pow construct. The exponent has to be 2. It rewrites the power as a multiplication and then uses D.1.23 mulTermsFull on it.

D.1.26 distrQuadNegNeg (x − a)2 ⇒ x2 − 2ax + a2 This rule checks that the input is an Sub construct with two non-term argu- ments, which is in the base of a Pow construct. The exponent has to be 2. It rewrites the power as a multiplication and then uses D.1.23 mulTermsFull on it.

D.1.27 powExpand Expands any power. It checks that the input is a Pow with the base being an Add or Sub and the exponent being an integer. It then rewrites the power as a multiplication and uses D.1.23 mulTermsFull on it. This is a more generic variation of rules D.1.26 distrQuadNegNeg and D.1.25 distrQuadPosPos. It handles more cases but also produces a more generic and less-explaining justification (see section 5.5).

D.1.28 breakOut ab + ac ⇒ a(b + c) This is a reverse of the D.1.22 mulInto rule. It does not perform a full factor- ization. It looks through all terms in the termlist, and finds their greatest common divisor. SymPy’s gcd function is not used, instead every term is decomposed into factors (integers are factorized into irreducible primes using SymPy’s factorint function), and then compared. The motivation was to reduce on SymPy functions, since they were not always well suited for the extra classes. Since everything was decomposed, it was also easy to remove the factors that the terms had in common. Then rules 5.4.3 simplifyProduct, 5.4.4 groupPow and 5.4.1 sort- Both were used to stitch the decomposed factors back together. The factors that were broken out are put in a Mul construct along with the terms from which these factors were broken out.

D.1.29 factorizePoly This is the reverse of rule D.1.23 mulTermsFull. It first converts the input formula into a standard SymPy formula that does not contain any custom classes. For example a Sub becomes an Add with the second argument negative and so on. Then the SymPy function factor is used to fully factorize the expression. Since it only produces factors that contain original SymPy classes, rule D.1.1 negAddToSub

78 D.1. RULES was used to get Sub constructs. Term factors were also typically sorted by SymPy’s own way (i.e. by hash value), so rule 5.4.1 sortBoth was used to get the desired sorting. As such, one general factorization algorithm is used that can perform almost any factorization. This is in contrast to [12], which also used a rule-based ap- proach where a pattern recognition algorithm was used, which searched for certain patterns that it knew how to factorize. That approach is comparable to the rules D.1.30 factQuadPosNeg, D.1.31 factQuadPosPos and D.1.32 factQuadNegNeg. The advantage with that approach is that it can factorize expressions in a method sim- ilar to how a student might do it. The disadvantage is that it can only factorize expressions following patterns it recognizes. For this thesis, both this general fac- torization rule was used and a few pattern rules. This allows the CAS to both use (a few) specific laws and rules taught in school as well as allow it to factorize arbitrary expressions as a fallback.

D.1.30 factQuadPosNeg x2 − a2 ⇒ (x + a)(x − a) This rule is an application of the "difference of two squares" law. It looks that input is on the correct format, and then rewrites it to the other form. The a2 term does not have to be a power explicitly but can also be an even integer square.

D.1.31 factQuadPosPos x2 + 2ax + a2 ⇒ (x + a)2 Looks that input is on the correct format, and then rewrites it to the other form. The second term of the input does not have to be explicitly 2 · a · x, but the coefficient for x has to be divisible by 2. Similarly, the a2 term does not have to be a power explicitly but can also be an even integer square.

D.1.32 factQuadNegNeg x2 − 2ax + a2 ⇒ (x − a)2 Looks that input is on the correct format, and then rewrites it to the other form. The second term of the input does not have to be explicitly 2 · a · x, but the coefficient for x has to be divisible by 2. Similarly, the a2 term does not have to be a power explicitly but can also be an even integer square.

D.1.33 simplifyPow

Simplifies a power where both the base and exponent are either integers or floats, using the __pow__ function of those classes.

79 APPENDIX D. RULES

D.1.34 simplifyPowFrac ac ac ⇒ b bc Distributes an exponent over a division construct. It will also attempt to apply rule D.1.33 simplifyPow on both the resulting numerator and denominator.

D.1.35 simplifyPowFracNeg  ac ac − ⇒ b bc Exactly as D.1.34 simplifyPowFrac but the division construct must be within a Neg construct.

D.1.36 powFracRewrite ac ac ⇒ b bc Distributes an exponent over a division construct. It works exactly like D.1.34 but it does not attempt to apply the D.1.33 simplifyPow rule.

D.1.37 powPow  c ab ⇒ abc Rewrites a Pow within another Pow to just one single Pow with a Mul in the resulting exponent. It will also attempt to use rule 5.4.3 on that Mul.

D.1.38 powRoot √ n ( n x) ⇒ x Simplifies a square root to the power of 2, or a n-root to the power of n.

D.1.39 powRemove1 a1 ⇒ a Removes an exponent that is 1.

D.1.40 divPowToPowneg a ⇒ ab−c bc Moves the denominator to the numerator with a negative exponent. b does not need to have an explicit exponent - it is then assumed its exponent is 1. If a is 1, it is removed from the end result and just b−c is returned.

80 D.1. RULES

D.1.41 sqrt0 √ 0 ⇒ 0 Simplifies roots of 0. Handles both Sqrt and Nroot.

D.1.42 simplifyRoot Simplifies roots of integers and of floats. Handles both Sqrt and Nroot. If the argument is an integer, will only apply the rule the result is a clean root, which means that the result is also an integer. If the argument is a float, the rule is applied either way. The rule is also only applied if the argument is not negative.

D.1.43 rootNegImg √ √ −a ⇒ a · i If the square root argument is negative, this rule "moves the minus sign out of the root" as the imaginary unit i.

D.1.44 simplifyRootFrac √ ra a ⇒ √ b b Distributes a root over a division construct. It handles both Sqrt and Nroot, and will attempt to use rule D.1.42 on both the resulting numerator and denomi- nator.

D.1.45 rootFracTopNeg √ r−a a ⇒ √ · i b b If the argument of a square root is a division construct and the numerator is negative, this rule "moves the minus sign out of the root" as the imaginary unit i.

D.1.46 rootFracBotNeg √ r a a ⇒ √ · i −b b If the argument of a square root is a division construct and the denominator is negative, this rule "moves the minus sign out of the root" as the imaginary unit i.

D.1.47 rewriteRootFrac √ ra a ⇒ √ b b Distributes a root over a division construct. It handles both Sqrt and Nroot, and is similar to rule D.1.44 simplifyRootFrac, but will not attempt to use rule D.1.42.

81 APPENDIX D. RULES

D.1.48 rootPow q n (x)n ⇒ x Simplifies a square root of a power to 2, or a n-root of a power to n.

D.1.49 plusminusSplitRename x = a ± b ⇒ x1 = a − b, x2 = a + b Splits a Plusminus construct into a MulAnswers constructs with two argu- ments, of which the first is the formula with the − and the second with the +. If the formula is an Equality with a single symbol on the left hand side, the symbol is also renamed to _, where the equation number is 1 for the first equation and 2 for the second equation.

D.1.50 pqFormula s p p2 x2 + px + q = 0 ⇒ x = − ± − q 2 2 Applies the pq-formula, a version of the that is commonly used in Swedish schools. It checks that the input is on the correct format, and then rewrites it.

82 www.kth.se