Learning Equations for Extrapolation and Control

Total Page:16

File Type:pdf, Size:1020Kb

Learning Equations for Extrapolation and Control Learning Equations for Extrapolation and Control Subham S. Sahoo 1 Christoph H. Lampert 2 Georg Martius 3 Abstract Machine learning research has only very recently started We present an approach to identify concise equa- to look into related techniques. As a first work, Martius & tions from data using a shallow neural network Lampert(2016) recently proposed EQL, a neural network approach. In contrast to ordinary black-box re- architecture for identifying functional relations between ob- gression, this approach allows understanding func- served inputs and outputs. Their networks represent only tional relations and generalizing them from ob- plausible functions through a specific choice of activation served data to unseen parts of the parameter space. functions and it prefers simple over complex solutions by We show how to extend the class of learnable imposing sparsity regularization. However, EQL has two equations for a recently proposed equation learn- significant shortcomings: first, it is not able to represent di- ing network to include divisions, and we im- visions, thereby severely limiting to which physical systems prove the learning and model selection strategy it can be applied, and second, its model selection procedure to be useful for challenging real-world data. For is unreliable in identifying the true functional relation out systems governed by analytical expressions, our of multiple plausible candidates. method can in many cases identify the true under- In this paper, we propose an improved network for lying equation and extrapolate to unseen domains. equation learning, EQL÷, that overcomes the limitation We demonstrate its effectiveness by experiments of the earlier works. In particular, our main contributions are on a cart-pendulum system, where only 2 random 1. we propose a network architecture that can handle di- rollouts are required to learn the forward dynam- visions as well as techniques to keep training stable, ics and successfully achieve the swing-up task. 2. we improve model/instance selection to be more effec- tive in identifying the right network/equation, 1. Introduction 3. we demonstrate how to reliably control a dynamical robotic system by learning its forward dynamics equa- In machine learning, models are typically treated as black- tions from very few random tryouts/tails. box function approximators that are only judged by their ability to predict correctly for unseen data (from the same The following section describes the equation learning distribution). In contrast, in the natural sciences, one method by Martius & Lampert(2016) and introduces our searches for interpretable models that provide a deeper un- improvements. Afterwards, we discuss its relation to other derstanding of the system of interest and allow formulating prior work. In Section4 we present experimental results hypotheses about unseen situations. The latter is only pos- on identifying equations and in Section5 we show its ap- sible if the true underlying functional relationship behind plication to robot control. We close with a discussion and the data has been identified. Therefore, when scientists con- arXiv:1806.07259v1 [cs.LG] 19 Jun 2018 outlook. struct models, they do not only minimize a training error but also impose constraints based on prior knowledge: models should be plausible, i. e. consist of components that have 2. Identifying equation with a network physical expressions in the real world, and they should be We consider a regression problem, where the data originates interpretable, which typically means that they consist only from a system that can be described by an (unknown) ana- of a small number of interacting units. lytical function, φ : Rn ! Rm. A typical example could be 1Indian Institute of Technology, Kharagpur, India 2IST Aus- a system of ordinary differential equations that describes the tria, Klosterneuburg, Austria 3Max Planck Institute for Intelligent dynamics of a robot, or the predator-prey equations of an Systems, Tübingen, Germany. Correspondence to: Georg Martius ecosystem. The observed data, f(x1; y1);:::; (xN ; yN )g <[email protected]>. is assumed to originate from y = φ(x) + ξ with additive Proceedings of the 35 th International Conference on Machine zero-mean noise ξ. Since φ is unknown, we model the input- n m Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 output relationship with a function : R ! R and aim by the author(s). to find an instance that minimizes the empirical error on Learning Equations for Extrapolation and Control id id sin sin cos cos (all-to-all) (all-to-all) (all-to-all) Figure 1. Network architecture of the proposed improved Equation Learner EQL÷ for 3 layers (L = 3) and one neuron per type (u = 3; v = 1). The new division operations are places in the final layer, see Martius & Lampert(2016) for the original model. the training set as well as on future data, potentially from a for gradient based optimization methods. different part of the feature space. For example, we might To overcome the divergence problem, we first notice that want to learn the robot dynamics only in a part of the feature from any real system we cannot encounter data at the pole space where we know it is safe to operate, while later it itself because natural quantities do not diverge. This im- should be possible also to make predictions for movements plies that a single branch of the hyperbola 1=b with b > 0 into unvisited areas. suffices as a basis function. As a further simplification we use divisions only in the ouput layer. 2.1. Equation Learner Finally, in order to prevent problems during optimization ÷ Before introducing EQL , we first recapitulate the work- we introduce a curriculum approach for optimization, pro- ing principles of the previously proposed Equation Learner gressing from a strongly regularized version of division to (EQL) network. It uses a multi-layer feed-forward network the unregularized one. with units representing the building blocks of algebraic ex- pressions. Instead of homogeneous hidden units, each unit Regularized Division: Instead of EQL’s Eq. (1), the last has a specific function, e. g. identity, cosine or multiplica- layer of the EQL÷ is tion, see Fig.1. Complex functions are implemented by (L) θ (L) (L) θ (L) (L) (l) (l) (l−1) (l) y := h1(z1 ; z2 ); : : : ; hm(z2m ; z2m+1) ; (2) alternating linear transformations, z = W y + wo in layer l, with the application of the base-functions. There where hθ(a; b) is the division-activation function given by are u unary functionsf ; : : : ; f , f 2 fidentity; sin; cosg, 1 u i ( which receive the respective component, z ; : : : ; z . The a if b > θ 1 u hθ(a; b) := b ; (3) v binary functions, g1; : : : ; gv receive the remaining com- 0 otherwise ponent, zu+1; : : : ; zu+2v, as input in pairs of two. In EQL these are multiplication units that compute the product of where θ ≥ 0 is a threshold, see Fig.2. Note that using hθ = 0 as the value when the denominator is below θ their two input values: gj(a; b) := a · b. The output of the unary and binary units are concatenated to form the output (forbidden values of b) sets the gradient to zero, avoiding y(l) of layer l. The last layer computes the regression values misleading parameter updates. So the discontinuity plays by a linear read-out no role in practice. y(L) := W (L)y(L−1) + w(L): (1) Penalty term: To steer the network away from negative o values of the denominator, we add a cost term to our ob- For a more detailed discussion of the architecture, see (Mar- jective that penalizes “forbidden” inputs to each division tius & Lampert, 2016). unit: θ 2.2. Introducing division units p (b) := max(θ − b; 0); (4) where θ is the threshold used in Eq. (3) and b is the denomi- The EQL architecture has some immediate shortcomings. nator, see Fig.2. The global penalty term is then In particular, it cannot model divisions, which are, however, common in the equations governing physical systems. We, N n ÷ θ X X θ (L) therefore, propose a new architecture, EQL , that includes P = p (z2j (xi)) (5) division units, which calculate a=b. Note that this is a non- i=1 j=1 trivial step because any division creates a pole at b ! 0 with (L) where z (x ) is the denominator of division unit j for an abrupt change in convexity and diverging function value 2j i input x , see Eq. (2). and its derivative. Such a divergence is a serious problem i Learning Equations for Extrapolation and Control 10 objective is Lasso-like (Tibshirani, 1996), 8 θ = 0:1 ) 6 θ = 0:5 N L ; b 1 X 2 X (l) θ (1 4 0:5 L = k (xi) − yik + λ W + P (8) h p (b) N 1 2 i=1 l=1 0 that is, a linear combination of L2 loss and L1 regularization 1:0 0:5 0:0 0:5 1:0 1:5 2:0 − − extended by the penalty term for small and negative denom- b inators, see Eq. (4). Note that P bound (Eq.6) is only used in Figure 2. Regularized division function hθ(a; b) and the associated the penalty epochs. For training, we apply a stochastic gradi- penalty term pθ(b). The penalty is linearly increasing for function values b < θ outside the desired input values. ent descent algorithm with mini-batches and Adam (Kingma & Ba, 2015) for calculating the updates. The choice of Adam is not critical, as we observed that standard stochastic gradient descent also works, though it might take longer. Regularization Phases: We follow the same regularization Penalty Epochs: While Eq. (5) prevents negative values in scheme as proposed in Martius & Lampert(2016).
Recommended publications
  • Mathematical Construction of Interpolation and Extrapolation Function by Taylor Polynomials Arxiv:2002.11438V1 [Math.NA] 26 Fe
    Mathematical Construction of Interpolation and Extrapolation Function by Taylor Polynomials Nijat Shukurov Department of Engineering Physics, Ankara University, Ankara, Turkey E-mail: [email protected] , [email protected] Abstract: In this present paper, I propose a derivation of unified interpolation and extrapolation function that predicts new values inside and outside the given range by expanding direct Taylor series on the middle point of given data set. Mathemati- cal construction of experimental model derived in general form. Trigonometric and Power functions adopted as test functions in the development of the vital aspects in numerical experiments. Experimental model was interpolated and extrapolated on data set that generated by test functions. The results of the numerical experiments which predicted by derived model compared with analytical values. KEYWORDS: Polynomial Interpolation, Extrapolation, Taylor Series, Expansion arXiv:2002.11438v1 [math.NA] 26 Feb 2020 1 1 Introduction In scientific experiments or engineering applications, collected data are usually discrete in most cases and physical meaning is likely unpredictable. To estimate the outcomes and to understand the phenomena analytically controllable functions are desirable. In the mathematical field of nu- merical analysis those type of functions are called as interpolation and extrapolation functions. Interpolation serves as the prediction tool within range of given discrete set, unlike interpola- tion, extrapolation functions designed to predict values out of the range of given data set. In this scientific paper, direct Taylor expansion is suggested as a instrument which estimates or approximates a new points inside and outside the range by known individual values. Taylor se- ries is one of most beautiful analogies in mathematics, which make it possible to rewrite every smooth function as a infinite series of Taylor polynomials.
    [Show full text]
  • Stable Extrapolation of Analytic Functions
    STABLE EXTRAPOLATION OF ANALYTIC FUNCTIONS LAURENT DEMANET AND ALEX TOWNSEND∗ Abstract. This paper examines the problem of extrapolation of an analytic function for x > 1 given perturbed samples from an equally spaced grid on [−1; 1]. Mathematical folklore states that extrapolation is in general hopelessly ill-conditioned, but we show that a more precise statement carries an interesting nuance. For a function f on [−1; 1] that is analytic in a Bernstein ellipse with parameter ρ > 1, and for a uniform perturbation level " on the function samples, we construct an asymptotically best extrapolant e(x) as a least squares polynomial approximant of degree M ∗ given explicitly. We show that the extrapolant e(x) converges to f(x) pointwise in the interval −1 Iρ 2 [1; (ρ+ρ )=2) as " ! 0, at a rate given by a x-dependent fractional power of ". More precisely, for each x 2 Iρ we have p x + x2 − 1 jf(x) − e(x)j = O "− log r(x)= log ρ ; r(x) = ; ρ up to log factors, provided that the oversampling conditioning is satisfied. That is, 1 p M ∗ ≤ N; 2 which is known to be needed from approximation theory. In short, extrapolation enjoys a weak form of stability, up to a fraction of the characteristic smoothness length. The number of function samples, N + 1, does not bear on the size of the extrapolation error provided that it obeys the oversampling condition. We also show that one cannot construct an asymptotically more accurate extrapolant from N + 1 equally spaced samples than e(x), using any other linear or nonlinear procedure.
    [Show full text]
  • On Sharp Extrapolation Theorems
    ON SHARP EXTRAPOLATION THEOREMS by Dariusz Panek M.Sc., Mathematics, Jagiellonian University in Krak¶ow,Poland, 1995 M.Sc., Applied Mathematics, University of New Mexico, USA, 2004 DISSERTATION Submitted in Partial Ful¯llment of the Requirements for the Degree of Doctor of Philosophy Mathematics The University of New Mexico Albuquerque, New Mexico December, 2008 °c 2008, Dariusz Panek iii Acknowledgments I would like ¯rst to express my gratitude for M. Cristina Pereyra, my advisor, for her unconditional and compact support; her unbounded patience and constant inspi- ration. Also, I would like to thank my committee members Dr. Pedro Embid, Dr Dimiter Vassilev, Dr Jens Lorens, and Dr Wilfredo Urbina for their time and positive feedback. iv ON SHARP EXTRAPOLATION THEOREMS by Dariusz Panek ABSTRACT OF DISSERTATION Submitted in Partial Ful¯llment of the Requirements for the Degree of Doctor of Philosophy Mathematics The University of New Mexico Albuquerque, New Mexico December, 2008 ON SHARP EXTRAPOLATION THEOREMS by Dariusz Panek M.Sc., Mathematics, Jagiellonian University in Krak¶ow,Poland, 1995 M.Sc., Applied Mathematics, University of New Mexico, USA, 2004 Ph.D., Mathematics, University of New Mexico, 2008 Abstract Extrapolation is one of the most signi¯cant and powerful properties of the weighted theory. It basically states that an estimate on a weighted Lpo space for a single expo- p nent po ¸ 1 and all weights in the Muckenhoupt class Apo implies a corresponding L estimate for all p; 1 < p < 1; and all weights in Ap. Sharp Extrapolation Theorems track down the dependence on the Ap characteristic of the weight.
    [Show full text]
  • 3.2 Rational Function Interpolation and Extrapolation
    104 Chapter 3. Interpolation and Extrapolation do 11 i=1,n Here we find the index ns of the closest table entry, dift=abs(x-xa(i)) if (dift.lt.dif) then ns=i dif=dift endif c(i)=ya(i) and initialize the tableau of c’s and d’s. d(i)=ya(i) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy their own personal use. Further reproduction, or any copyin Copyright (C) 1986-1992 by Cambridge University Press. Programs Copyright (C) 1986-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN FORTRAN 77: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43064-X) enddo 11 y=ya(ns) This is the initial approximation to y. ns=ns-1 do 13 m=1,n-1 For each column of the tableau, do 12 i=1,n-m we loop over the current c’s and d’s and update them. ho=xa(i)-x hp=xa(i+m)-x w=c(i+1)-d(i) den=ho-hp if(den.eq.0.)pause ’failure in polint’ This error can occur only if two input xa’s are (to within roundoff)identical. den=w/den d(i)=hp*den Here the c’s and d’s are updated. c(i)=ho*den enddo 12 if (2*ns.lt.n-m)then After each column in the tableau is completed, we decide dy=c(ns+1) which correction, c or d, we want to add to our accu- else mulating value of y, i.e., which path to take through dy=d(ns) the tableau—forking up or down.
    [Show full text]
  • 3.1 Polynomial Interpolation and Extrapolation
    108 Chapter 3. Interpolation and Extrapolation f(x, y, z). Multidimensional interpolation is often accomplished by a sequence of one-dimensional interpolations. We discuss this in 3.6. § CITED REFERENCES AND FURTHER READING: Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions, Applied Mathe- visit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to [email protected] (outside North America). readable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMs Permission is granted for internet users to make one paper copy their own personal use. Further reproduction, or any copying of machine- Copyright (C) 1988-1992 by Cambridge University Press.Programs Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), 25.2. § Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), Chapter 2. Acton, F.S. 1970, Numerical Methods That Work; 1990, corrected edition (Washington: Mathe- matical Association of America), Chapter 3. Kahaner, D., Moler, C., and Nash, S. 1989, Numerical Methods and Software (Englewood Cliffs, NJ: Prentice Hall), Chapter 4. Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis, 2nd ed. (Reading, MA: Addison- Wesley), Chapter 5. Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis, 2nd ed. (New York: McGraw-Hill), Chapter 3. Isaacson, E., and Keller, H.B. 1966, Analysis of Numerical Methods (New York: Wiley), Chapter 6.
    [Show full text]
  • Interpolation and Extrapolation in Statistics
    Interpolation is a useful mathematical and statistical tool used to estimate values between two points. In this lesson, you will learn about this tool, its formula and how to use it. What Is Interpolation? Interpolation is the process of finding a value between two points on a line or curve. To help us remember what it means, we should think of the first part of the word, 'inter,' as meaning 'enter,' which reminds us to look 'inside' the data we originally had. This tool, interpolation, is not only useful in statistics, but is also useful in science, business or any time there is a need to predict values that fall within two existing data points. Interpolation Example Here's an example that will illustrate the concept of interpolation. A gardener planted a tomato plant and she measured and kept track of its growth every other day. This gardener is a curious person, and she would like to estimate how tall her plant was on the fourth day. Her table of observations looked like this: Based on the chart, it's not too difficult to figure out that the plant was probably 6 mm tall on the fourth day. This is because this disciplined tomato plant grew in a linear pattern; there was a linear relationship between the number of days measured and the plant's height growth. Linear pattern means the points created a straight line. We could even estimate by plotting the data on a graph. PDF Created with deskPDF PDF Writer - Trial :: http://www.docudesk.com But what if the plant was not growing with a convenient linear pattern? What if its growth looked more like this? What would the gardener do in order to make an estimation based on the above curve? Well, that is where the interpolation formula would come in handy.
    [Show full text]
  • Interpolation and Extrapolation: Introduction
    Interpolation and Extrapolation: Introduction [Nematrian website page: InterpolationAndExtrapolationIntro, © Nematrian 2015] Suppose we know the value that a function 푓(푥) takes at a set of points {푥0, 푥1, … , 푥푁−1} say, where we have ordered the 푥푖 so that 푥0 < 푥1 < ⋯ < 푥푁−1. However we do not have an analytic expression for 푓(푥) that allows us to calculate it at an arbitrary point. Often the 푥푖’s are equally spaced, but not necessarily. How might we best estimate 푓(푥) for arbitrary 푥 by in some sense drawing a smooth curve through and potentially beyond the 푥푖? If 푥0 < 푥 < 푥푁−1, i.e. within the range spanned by the 푥푖 then this problem is called interpolation, otherwise it is called extrapolation. To do this we need to model 푓(푥), between or beyond the known points, by some plausible functional form. It is relatively easy to find pathological functions that invalidate any given interpolation scheme, so there is no single ‘right’ answer to this problem. Approaches that are often used in practice involve modelling 푓(푥) using polynomials or rational functions (i.e. quotients of polynomials). Trigonometric functions, i.e. sines and cosines, can also be used, giving rise to so- called Fourier methods. The approach used can be ‘global’, in the sense that we fit to all points simultaneously (giving each in some sense ‘equal’ weight in the computation). More commonly, the approach adopted is ‘local’, in the sense that we give greater weight in the curve fitting to points ‘close’ to the value of 푥 in which we are interested.
    [Show full text]
  • Numerical Integration of Ordinary Differential Equations Based on Trigonometric Polynomials
    Numerische Mathematik 3, 381 -- 397 (1961) Numerical integration of ordinary differential equations based on trigonometric polynomials By WALTER GAUTSCHI* There are many numerical methods available for the step-by-step integration of ordinary differential equations. Only few of them, however, take advantage of special properties of the solution that may be known in advance. Examples of such methods are those developed by BROCK and MURRAY [9], and by DENNIS Eg], for exponential type solutions, and a method by URABE and MISE [b~ designed for solutions in whose Taylor expansion the most significant terms are of relatively high order. The present paper is concerned with the case of periodic or oscillatory solutions where the frequency, or some suitable substitute, can be estimated in advance. Our methods will integrate exactly appropriate trigonometric poly- nomials of given order, just as classical methods integrate exactly algebraic polynomials of given degree. The resulting methods depend on a parameter, v=h~o, where h is the step length and ~o the frequency in question, and they reduce to classical methods if v-~0. Our results have also obvious applications to numerical quadrature. They will, however, not be considered in this paper. 1. Linear functionals of algebraic and trigonometric order In this section [a, b~ is a finite closed interval and C~ [a, b~ (s > 0) denotes the linear space of functions x(t) having s continuous derivatives in Fa, b~. We assume C s [a, b~ normed by s (t.tt IIxll = )2 m~x Ix~ (ttt. a=0 a~t~b A linear functional L in C ~[a, bl is said to be of algebraic order p, if (t.2) Lt': o (r : 0, 1 ....
    [Show full text]
  • Forecasting by Extrapolation: Conclusions from Twenty-Five Years of Research
    University of Pennsylvania ScholarlyCommons Marketing Papers Wharton Faculty Research November 1984 Forecasting by Extrapolation: Conclusions from Twenty-five Years of Research J. Scott Armstrong University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/marketing_papers Recommended Citation Armstrong, J. S. (1984). Forecasting by Extrapolation: Conclusions from Twenty-five Years of Research. Retrieved from https://repository.upenn.edu/marketing_papers/77 Postprint version. Published in Interfaces, Volume 14, Issue 6, November 1984, pages 52-66. Publisher URL: http://www.aaai.org/AITopics/html/interfaces.html The author asserts his right to include this material in ScholarlyCommons@Penn. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/marketing_papers/77 For more information, please contact [email protected]. Forecasting by Extrapolation: Conclusions from Twenty-five Years of Research Abstract Sophisticated extrapolation techniques have had a negligible payoff for accuracy in forecasting. As a result, major changes are proposed for the allocation of the funds for future research on extrapolation. Meanwhile, simple methods and the combination of forecasts are recommended. Comments Postprint version. Published in Interfaces, Volume 14, Issue 6, November 1984, pages 52-66. Publisher URL: http://www.aaai.org/AITopics/html/interfaces.html The author asserts his right to include this material in ScholarlyCommons@Penn. This journal article is available at ScholarlyCommons: https://repository.upenn.edu/marketing_papers/77 Published in Interfaces, 14 (Nov.-Dec. 1984), 52-66, with commentaries and reply. Forecasting by Extrapolation: Conclusions from 25 Years of Research J. Scott Armstrong Wharton School, University of Pennsylvania Sophisticated extrapolation techniques have had a negligible payoff for accuracy in forecasting.
    [Show full text]
  • Interpolation and Extrapolation Schemes Must Model the Function, Between Or Beyond the Known Points, by Some Plausible Functional Form
    ✐ “nr3” — 2007/5/1 — 20:53 — page 110 — #132 ✐ ✐ ✐ Interpolation and CHAPTER 3 Extrapolation 3.0 Introduction We sometimes know the value of a function f.x/at a set of points x0;x1;:::; xN 1 (say, with x0 <:::<xN 1), but we don’t have an analytic expression for ! ! f.x/ that lets us calculate its value at an arbitrary point. For example, the f.xi /’s might result from some physical measurement or from long numerical calculation that cannot be cast into a simple functional form. Often the xi ’s are equally spaced, but not necessarily. The task now is to estimate f.x/ for arbitrary x by, in some sense, drawing a smooth curve through (and perhaps beyond) the xi . If the desired x is in between the largest and smallest of the xi ’s, the problem is called interpolation;ifx is outside that range, it is called extrapolation,whichisconsiderablymorehazardous(asmany former investment analysts can attest). Interpolation and extrapolation schemes must model the function, between or beyond the known points, by some plausible functional form. The form should be sufficiently general so as to be able to approximate large classes of functions that might arise in practice. By far most common among the functional forms used are polynomials (3.2). Rational functions (quotients of polynomials) also turn out to be extremely useful (3.4). Trigonometric functions, sines and cosines, give rise to trigonometric interpolation and related Fourier methods, which we defer to Chapters 12 and 13. There is an extensive mathematical literature devoted to theorems about what sort of functions can be well approximated by which interpolating functions.
    [Show full text]
  • Vector Extrapolation Methods with Applications to Solution of Large Systems of Equations and to Pagerank Computations
    Computers and Mathematics with Applications 56 (2008) 1–24 www.elsevier.com/locate/camwa Vector extrapolation methods with applications to solution of large systems of equations and to PageRank computations Avram Sidi Computer Science Department, Technion - Israel Institute of Technology, Haifa 32000, Israel Received 19 June 2007; accepted 6 November 2007 Dedicated to the memory of Professor Gene H. Golub (1932–2007) Abstract An important problem that arises in different areas of science and engineering is that of computing the limits of sequences of N vectors {xn}, where xn ∈ C with N very large. Such sequences arise, for example, in the solution of systems of linear or nonlinear equations by fixed-point iterative methods, and limn→∞ xn are simply the required solutions. In most cases of interest, however, these sequences converge to their limits extremely slowly. One practical way to make the sequences {xn} converge more quickly is to apply to them vector extrapolation methods. In this work, we review two polynomial-type vector extrapolation methods that have proved to be very efficient convergence accelerators; namely, the minimal polynomial extrapolation (MPE) and the reduced rank extrapolation (RRE). We discuss the derivation of these methods, describe the most accurate and stable algorithms for their implementation along with the effective modes of usage in solving systems of equations, nonlinear as well as linear, and present their convergence and stability theory. We also discuss their close connection with the method of Arnoldi and with GMRES, two well-known Krylov subspace methods for linear systems. We show that they can be used very effectively to obtain the dominant eigenvectors of large sparse matrices when the corresponding eigenvalues are known, and provide the relevant theory as well.
    [Show full text]
  • 3.2 Rational Function Interpolation and Extrapolation 111
    3.2 Rational Function Interpolation and Extrapolation 111 3.2 Rational Function Interpolation and Extrapolation Some functions are not well approximated by polynomials, but are well approximated by rational functions, that is quotients of polynomials. We de- http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) note by Ri(i+1)...(i+m) a rational function passing through the m +1 points (xi,yi) ...(xi+m,yi+m). More explicitly, suppose P (x) p + p x + ···+ p xµ R = µ = 0 1 µ ( ) i(i+1)...(i+m) ν 3.2.1 Qν (x) q0 + q1x + ···+ qν x Since there are µ + ν +1unknown p’s and q’s (q0 being arbitrary), we must have m +1=µ + ν +1 (3.2.2) In specifying a rational function interpolating function, you must give the desired order of both the numerator and the denominator. Rational functions are sometimes superior to polynomials, roughly speaking, because of their ability to model functions with poles, that is, zeros of the denominator of equation (3.2.1).
    [Show full text]