
Can the Eureqa Symbolic Regression Program, Computer Algebra, and Numerical Analysis Help Each Other? David R. Stoutemyer ureqa is a symbolic regression program was curious to learn if the program could also described by Schmidt and Lipson [11], be used as a supplementary tool for experimental freely downloadable from [13], where mathematics, computer algebra, and numerical there are press citations, a bibliography analysis. In the first few weeks of using this tool, I of its use in articles, a blog, and a have already found that: Ediscussion group. In contrast to typical regression software, the user does not have to explicitly or 1) Eureqa can sometimes do a job of exact simpli- implicitly provide a specific expression containing fication better than existing computer algebra unknown constants for the software to determine. With little or no guidance, symbolic regression systems if there is a concise equivalent determines not only unknown coefficients but also expression. the class and form of the model expression. See [9] 2) Eureqa can often discover simple expres- for quick insight into the underlying “survival of sions that approximate more complicated the fittest” paradigm for symbolic regression. An expressions or fit a set of numerical results Internet search on “symbolic regression” reveals well. that there are other such programs. However, my 3) Even when the fit isn’t as accurate as de- skeptic’s curiosity was aroused by such extreme press praise as sired, the form of a returned expression Move over, Einstein: Machines will take it often suggests a class of forms to try for from here. [10] classic regressions, interpolations, or series expansions giving higher accuracy. There are very clever “thinking machines” in existence today, such as Watson, the IBM 4) Eureqa could do an even better job if it computer that conquered Jeopardy! last supplemented its search with exploitation year. But next to Eureqa, Watson is merely of more computer algebra and numerical a glorified search engine. [5] methods, including classic regression. By one yardstick, Eureqa has even discov- 5) There are important caveats about how to ered the answer to the ultimate question of use Eureqa effectively. life, the universe and everything. [5] 6) The most extreme press quotes are exag- The program is designed to work with noisy gerations, but Eureqa really is quite impres- experimental data, searching for, then returning a sive. set of result expressions that attempt to optimally trade off conciseness with accuracy. However, I This article has examples that illustrate these findings, using Eureqa 0.93.1 beta. David Stoutemyer is a retired professor of information and computer science at the University of Hawaii. His email Regarding computing times reported herein, the address is [email protected]. computer is a 1.60GHz Intel Core 2 Duo U9600 DOI: http://dx.doi.org/10.1090/noti1000 CPU with 3 gigabytes of RAM. June/July 2013 Notices of the AMS 713 Exact Simplification and Transformation 1) First I plotted expression (1) in Mathematica, A Trigonometric Simplification Example revealing that it is antisymmetric and that its fundamental period appeared to be π, with Here is a Maple 15 assignment of an input trigono- higher frequency components appearing to metric expression to a dependent variable y: have a minimum period of π=4. This was cos(x)3 sin(x) confirmed by TrigReduce[y], which returned y := cos(x)3 sin(x) + 2 the equivalent expression 3 1 + 2 cos(x) cos(2x) sin(x) (sin(2x)+6 sin(4x)+4 sin (6x)+sin(8x)) : 16 (1) 2) I decided to use evenly spaced values of x cos(x)3 cos(4x) sin(x) + because expression (1) is periodic, bounded, 2 and C1 for all real x. I guessed that it 3 − cos(x) sin(x)3 − 2 cos(x) cos(2x) sin(x)3 might help Eureqa discover and exploit 2 the antisymmetry if I used sample points cos(x) cos(4x) sin(x)3 − : symmetric about x = 0. I also guessed that it 2 might help Eureqa discover the periodicities The default simplification merely combines the if I used exactly two fundamental periods. first two terms, which are similar. I guessed that using 16 samples within the However, the simplify(y) function1 required minimum period π=4 would be sufficient only 0.03 seconds to return the much simpler to resolve it quite well; then I doubled that because Eureqa uses some points for (2) 4 sin(x) cos(x)5 2 cos(x)2 − 1 : fitting and others for error assessment. This implies 128 intervals of width π=64 from An equivalent expression produced by the Math- −π through π. I then created a table of ematica FullSimplify[. : .] function in only 0.08 17-digit floating-point pairs of x and y, then seconds is also much simpler: exported it to a file by entering 2 (sin(3x) − sin(x)) cos5(x) : Export “trigExample:csv”; Table But the equivalent even simpler form discovered π by Eureqa is N [fx; : : :g; 17], fx; −π; π; g 128 4 (3) y = cos(x) sin(4x) : where the ellipsis was expression (1).2 That even the simplified model of evolution in 3) I then launched Eureqa, opened its Eureqa can work so well might change some minds. spreadsheet-like Enter Data tab, then replaced the default data there with mine Caveat: The algorithm uses a random number by importing file trigExample.csv. In the row generator with no current user control over its labeled var, I then entered x in column A seeding, which appears to be done by the clock. and y in column B. Therefore sequences are not currently repeatable, 4) The Prepare Data tab then showed superim- and the computing times to obtain expression (3) posed plots of the x and y data values and varied dramatically, from 3 seconds to several offered preprocessing options that aren’t minutes. It conceivably could require more time relevant for this very accurate data. than you would ever be willing to invest. However, 5) Figure 1 shows the Set Target tab: although even 3 seconds is much longer than the (a) The pane labeled The Target Expression computer algebra times, it is certainly worth a suggests the fitting model y = f (x), which is few minutes wait to obtain such a nice result, and what I want, so I didn’t change it. such experiments can be done while away from (b) The pane labeled Primary Options has check the computer, such as while eating, sleeping, or boxes for the desired Formula building- playing cell phone games. There is also an option blocks used in composing candidate f (x) to use parallel cloud computing, which reduces the expressions. The default checked ones are mean and variance of the elapsed time necessary sufficient for the sort of concise equivalent to obtain a satisfactory result. Here is how I used Eureqa to obtain this 2The reason for requesting 17 significant digits was that I delightfully simple exact result (3): wanted Mathematica to use its adaptive significance arith- metic to give me results estimated to be accurate to 17 digits despite any catastrophic cancellations, and I wanted Eureqa 1simplify(. : .) has an optional second keyword argument to receive a 17th significant digit to help it round the se- “size” that is intended to minimize size, but for this example quences of input digit characters to the closest representable the result is less concise than (2). 16-digit IEEE double values, which are used by Eureqa. 714 Notices of the AMS Volume 60, Number 6 Figure 1. Eureqa Set Target tab for exact trigonometric simplification example. to expression (1) that I am seeking. Therefore Maximize the R-squared goodness of fit or I didn’t check any of the other offered Maximize the correlation coefficient is more functions and operators, not all of which scale and mean invariant, which are desirable are shown on this screen shot.3 Formula properties too. However, the data is already building blocks also shows the corresponding well scaled with means of 0. Complexity measures, which can be altered (d) For the covered Data Splitting drop-down by the user. The complexity measure of an menu the default alternative is to designate expression is the sum of the complexities of a certain percentage of the data points its parts. for fitting the data (training) and a certain (c) The drop-down menu labeled Error metric percentage for assessing the error measure offers different built-in error measures for (validation). The individual assignments to Eureqa to try optimizing. A bound is usually these categories are done randomly, with more reassuring than the alternatives, so some overlap if there aren’t many data I chose Minimize the worst-case maximum points. For the almost exact data in this error. The documentation suggests that article, I would prefer that alternative points be assigned to alternative categories, with the 3 I could have saved search time by unchecking Division, be- end points being used for training. However, cause the floating-point coefficients make a denominator none of the alternatives offered this, so I unnecessary for this class of expressions. I could also have chose the default.4 saved search time and perhaps obtained a more accurate result by checking the Integer Constant box, because integer 6) I then pressed the Run button on the coefficients are quite likely for exact equivalent expressions, Start Search tab and watched the tempo- making it worth having Eureqa try rounding to see if that ral progress of the search. The plot in improves the accuracy. However, I decided to accept the de- fault checked boxes to see if Eureqa could find a good exact 4I have since learned that, with such precise data, another equivalent without any more help from me.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages12 Page
-
File Size-