Deep Learning for Symbolic Mathematics
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Part Vi Neural Machine Translation, Seq2seq and Attention 2
CS224n: Natural Language Processing with Deep 1 Learning 1 Course Instructors: Christopher Lecture Notes: Part VI Manning, Richard Socher 2 Neural Machine Translation, Seq2seq and Attention 2 Authors: Guillaume Genthial, Lucas Liu, Barak Oshri, Kushal Ranjan Winter 2019 Keyphrases: Seq2Seq and Attention Mechanisms, Neural Machine Translation, Speech Processing 1 Neural Machine Translation with Seq2Seq So far in this class, we’ve dealt with problems of predicting a single output: an NER label for a word, the single most likely next word in a sentence given the past few, and so on. However, there’s a whole class of NLP tasks that rely on sequential output, or outputs that are sequences of potentially varying length. For example, • Translation: taking a sentence in one language as input and out- putting the same sentence in another language. • Conversation: taking a statement or question as input and re- sponding to it. • Summarization: taking a large body of text as input and out- putting a summary of it. In these notes, we’ll look at sequence-to-sequence models, a deep learning-based framework for handling these types of problems. This framework proved to be very effective, and has, in fewer than 3 years, become the standard for machine translation. 1.1 Brief Note on Historical Approaches In the past, translation systems were based on probabilistic models constructed from: •a translation model, telling us what a sentence/phrase in a source language most likely translates into •a language model, telling us how likely a given sentence/phrase is overall. These components were used to build translation systems based on words or phrases. -
Plotting Direction Fields in Matlab and Maxima – a Short Tutorial
Plotting direction fields in Matlab and Maxima – a short tutorial Luis Carvalho Introduction A first order differential equation can be expressed as dx x0(t) = = f(t; x) (1) dt where t is the independent variable and x is the dependent variable (on t). Note that the equation above is in normal form. The difficulty in solving equation (1) depends clearly on f(t; x). One way to gain geometric insight on how the solution might look like is to observe that any solution to (1) should be such that the slope at any point P (t0; x0) is f(t0; x0), since this is the value of the derivative at P . With this motivation in mind, if you select enough points and plot the slopes in each of these points, you will obtain a direction field for ODE (1). However, it is not easy to plot such a direction field for any function f(t; x), and so we must resort to some computational help. There are many computational packages to help us in this task. This is a short tutorial on how to plot direction fields for first order ODE’s in Matlab and Maxima. Matlab Matlab is a powerful “computing environment that combines numeric computation, advanced graphics and visualization” 1. A nice package for plotting direction field in Matlab (although resourceful, Matlab does not provide such facility natively) can be found at http://math.rice.edu/»dfield, contributed by John C. Polking from the Dept. of Mathematics at Rice University. To install this add-on, simply browse to the files for your version of Matlab and download them. -
A Note on Integral Points on Elliptic Curves 3
Journal de Th´eorie des Nombres de Bordeaux 00 (XXXX), 000–000 A note on integral points on elliptic curves par Mark WATKINS Abstract. We investigate a problem considered by Zagier and Elkies, of finding large integral points on elliptic curves. By writ- ing down a generic polynomial solution and equating coefficients, we are led to suspect four extremal cases that still might have non- degenerate solutions. Each of these cases gives rise to a polynomial system of equations, the first being solved by Elkies in 1988 using the resultant methods of Macsyma, with there being a unique rational nondegenerate solution. For the second case we found that resultants and/or Gr¨obner bases were not very efficacious. Instead, at the suggestion of Elkies, we used multidimensional p-adic Newton iteration, and were able to find a nondegenerate solution, albeit over a quartic number field. Due to our methodol- ogy, we do not have much hope of proving that there are no other solutions. For the third case we found a solution in a nonic number field, but we were unable to make much progress with the fourth case. We make a few concluding comments and include an ap- pendix from Elkies regarding his calculations and correspondence with Zagier. Resum´ e.´ A` la suite de Zagier et Elkies, nous recherchons de grands points entiers sur des courbes elliptiques. En ´ecrivant une solution polynomiale g´en´erique et en ´egalisant des coefficients, nous obtenons quatre cas extr´emaux susceptibles d’avoir des so- lutions non d´eg´en´er´ees. Chacun de ces cas conduit `aun syst`eme d’´equations polynomiales, le premier ´etant r´esolu par Elkies en 1988 en utilisant les r´esultants de Macsyma; il admet une unique solution rationnelle non d´eg´en´er´ee. -
Deep Transformer Models for Time Series Forecasting:The Influenza
Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case Neo Wu 1 Bradley Green 1 Xue Ben 1 Shawn O’Banion 1 Abstract ods. Mechanistic modeling is based on the understanding of In this paper, we present a new approach to time underlying disease infection dynamics. For example, com- series forecasting. Time series data are preva- partmental methods such as SIR are popular approaches to lent in many scientific and engineering disciplines. simulating disease spreading dynamics. Time series forecasting is a crucial task in mod- Statistical and machine learning methods leverage the eling time series data, and is an important area ground truth data to learn the trends and patterns. One of machine learning. In this work we developed popular family of methods includes auto-regression (AR), a novel method that employs Transformer-based autoregressive moving average (ARMA), and autoregressive machine learning models to forecast time series integrated moving average (ARIMA). Additionally, deep data. This approach works by leveraging self- learning approaches based on convolutional and recurrent attention mechanisms to learn complex patterns neural networks have been developed to model ILI data. and dynamics from time series data. Moreover, These sequence-aligned models are natural choices for mod- it is a generic framework and can be applied to eling time series data. However, due to “gradient vanishing univariate and multivariate time series data, as and exploding” problems in RNNs and the limits of convo- well as time series embeddings. Using influenza- lutional filters, these methods have limitations in modeling like illness (ILI) forecasting as a case study, we long-term and complex relations in the sequence data. -
CAS (Computer Algebra System) Mathematica
CAS (Computer Algebra System) Mathematica- UML students can download a copy for free as part of the UML site license; see the course website for details From: Wikipedia 2/9/2014 A computer algebra system (CAS) is a software program that allows [one] to compute with mathematical expressions in a way which is similar to the traditional handwritten computations of the mathematicians and other scientists. The main ones are Axiom, Magma, Maple, Mathematica and Sage (the latter includes several computer algebras systems, such as Macsyma and SymPy). Computer algebra systems began to appear in the 1960s, and evolved out of two quite different sources—the requirements of theoretical physicists and research into artificial intelligence. A prime example for the first development was the pioneering work conducted by the later Nobel Prize laureate in physics Martin Veltman, who designed a program for symbolic mathematics, especially High Energy Physics, called Schoonschip (Dutch for "clean ship") in 1963. Using LISP as the programming basis, Carl Engelman created MATHLAB in 1964 at MITRE within an artificial intelligence research environment. Later MATHLAB was made available to users on PDP-6 and PDP-10 Systems running TOPS-10 or TENEX in universities. Today it can still be used on SIMH-Emulations of the PDP-10. MATHLAB ("mathematical laboratory") should not be confused with MATLAB ("matrix laboratory") which is a system for numerical computation built 15 years later at the University of New Mexico, accidentally named rather similarly. The first popular computer algebra systems were muMATH, Reduce, Derive (based on muMATH), and Macsyma; a popular copyleft version of Macsyma called Maxima is actively being maintained. -
Deep Learning
CSI 436/536 Introduction to Machine Learning Deep Learning Professor Siwei Lyu Computer Science University at Albany, State University of New York the up and downs of NN • first high wave 1960s: simple one layer perceptron CHAPTER• first 1. INTRODUCTION down wave 1970s: show of limitations of one layer perception • second high wave 1980s: development of BP and many uses (and abuses) • second down wave late 1990s to 2006: overfitting problem and vanishing gradient 0.000250 cybernetics 0.000200 (connectionism + neural networks) 0.000150 0.000100 0.000050 Frequency of Word0.000000 or Phrase 1940 1950 1960 1970 1980 1990 2000 Year Figure 1.7: The figure shows two of the three historical waves of artificial neural nets research, as measured by the frequency of the phrases “cybernetics” and “connectionism” or “neural networks” according to Google Books (the third wave is too recent to appear). The first wave started with cybernetics in the 1940s–1960s, with the development of theories of biological learning (McCulloch and Pitts, 1943; Hebb, 1949)andimplementationsof the first models such as the perceptron (Rosenblatt, 1958) allowing the training of a single neuron. The second wave started with the connectionist approach of the 1980–1995 period, with back-propagation (Rumelhart et al., 1986a) to train a neural network with one or two hidden layers. The current and third wave, deep learning, started around 2006 (Hinton et al., 2006; Bengio et al., 2007; Ranzato et al., 2007a), and is just now appearing in book form as of 2016. The other two waves similarly appeared in book form much later than the corresponding scientific activity occurred. -
Autograph: Imperative-Style Coding with Graph-Based Performance
AUTOGRAPH:IMPERATIVE-STYLE CODING WITH GRAPH-BASED PERFORMANCE Dan Moldovan 1 James M Decker 2 Fei Wang 2 Andrew A Johnson 1 Brian K Lee 1 Zachary Nado 1 D Sculley 1 Tiark Rompf 2 Alexander B Wiltschko 1 ABSTRACT There is a perceived trade-off between machine learning code that is easy to write, and machine learning code that is scalable or fast to execute. In machine learning, imperative style libraries like Autograd and PyTorch are easy to write, but suffer from high interpretive overhead and are not easily deployable in production or mobile settings. Graph-based libraries like TensorFlow and Theano benefit from whole-program optimization and can be deployed broadly, but make expressing complex models more cumbersome. We describe how the use of staged programming in Python, via source code transformation, offers a midpoint between these two library design patterns, capturing the benefits of both. A key insight is to delay all type-dependent decisions until runtime, similar to dynamic dispatch. We instantiate these principles in AutoGraph, a software system that improves the programming experience of the TensorFlow library, and demonstrate usability improvements with no loss in performance compared to native TensorFlow graphs. We also show that our system is backend agnostic, targeting an alternate IR with characteristics not found in TensorFlow graphs. 1 PROGRAMMING PARADIGMS FOR code directly, building up a representation of the user’s pro- MACHINE LEARNING gram incrementally for either automatic differentiation or compilation. TensorFlow also supports imperative-style cod- Programming platforms specialized for machine learning ing via “eager execution”, where user-written Python code (ML) are undergoing widespread adoption, as ML mod- immediately executes TensorFlow kernels, without a graph els such as neural networks demonstrate state-of-the-art being built. -
Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction
TensorFlow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction TensorFlow Google Brain, 2015 (rewritten DistBelief) Theano University of Montréal, 2009 Keras François Chollet, 2015 (now at Google) Torch Facebook AI Research, Twitter, Google DeepMind Caffe Berkeley Vision and Learning Center (BVLC), 2013 Outline 1. Introduction of each framework a. TensorFlow b. Theano c. Keras d. Torch e. Caffe 2. Further comparison a. Code + models b. Community and documentation c. Performance d. Model deployment e. Extra features 3. Which framework to choose when ..? Introduction of each framework TensorFlow architecture 1) Low-level core (C++/CUDA) 2) Simple Python API to define the computational graph 3) High-level API (TF-Learn, TF-Slim, soon Keras…) TensorFlow computational graph - auto-differentiation! - easy multi-GPU/multi-node - native C++ multithreading - device-efficient implementation for most ops - whole pipeline in the graph: data loading, preprocessing, prefetching... TensorBoard TensorFlow development + bleeding edge (GitHub yay!) + division in core and contrib => very quick merging of new hotness + a lot of new related API: CRF, BayesFlow, SparseTensor, audio IO, CTC, seq2seq + so it can easily handle images, videos, audio, text... + if you really need a new native op, you can load a dynamic lib - sometimes contrib stuff disappears or moves - recently introduced bells and whistles are barely documented Presentation of Theano: - Maintained by Montréal University group. - Pioneered the use of a computational graph. - General machine learning tool -> Use of Lasagne and Keras. - Very popular in the research community, but not elsewhere. Falling behind. What is it like to start using Theano? - Read tutorials until you no longer can, then keep going. -
Arxiv:Cs/0608005V2 [Cs.SC] 12 Jun 2007
AEI-2006-037 cs.SC/0608005 A field-theory motivated approach to symbolic computer algebra Kasper Peeters Max-Planck-Institut f¨ur Gravitationsphysik, Albert-Einstein-Institut Am M¨uhlenberg 1, 14476 Golm, GERMANY Abstract Field theory is an area in physics with a deceptively compact notation. Although general pur- pose computer algebra systems, built around generic list-based data structures, can be used to represent and manipulate field-theory expressions, this often leads to cumbersome input formats, unexpected side-effects, or the need for a lot of special-purpose code. This makes a direct trans- lation of problems from paper to computer and back needlessly time-consuming and error-prone. A prototype computer algebra system is presented which features TEX-like input, graph data structures, lists with Young-tableaux symmetries and a multiple-inheritance property system. The usefulness of this approach is illustrated with a number of explicit field-theory problems. 1. Field theory versus general-purpose computer algebra For good reasons, the area of general-purpose computer algebra programs has histor- ically been dominated by what one could call “list-based” systems. These are systems which are centred on the idea that, at the lowest level, mathematical expressions are nothing else but nested lists (or equivalently: nested functions, trees, directed acyclic graphs, . ). There is no doubt that a lot of mathematics indeed maps elegantly to problems concerning the manipulation of nested lists, as the success of a large class of LISP-based computer algebra systems illustrates (either implemented in LISP itself or arXiv:cs/0608005v2 [cs.SC] 12 Jun 2007 in another language with appropriate list data structures). -
Convolutional Attention-Based Seq2seq Neural Network for End-To-End ASR
Thesis for the Degree of Master Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR by Dan Lim Department of Computer Science and Engineering Korea University Graduate School arXiv:1710.04515v1 [cs.CL] 12 Oct 2017 Abstract Traditional approach in artificial intelligence (AI) have been solving the problem that is difficult for human but relatively easy for computer if it could be formulated as mathematical rules or formal languages. However, their symbol, rule-based approach failed in the problem where human being solves intuitively like image recognition, natural language understanding and speech recognition. Therefore the machine learning, which is subfield of AI, have tackled this intuitive problems by making the computer learn from data automatically instead of human efforts of extracting complicated rules. Especially the deep learning which is a particular kind of machine learning as well as central theme of this thesis, have shown great popularity and usefulness recently. It has been known that the powerful computer, large dataset and algo- rithmic improvement have made recent success of the deep learning. And this factors have enabled recent research to train deeper network achieving significant performance improvement. Those current research trends moti- vated me to quest deeper architecture for the end-to-end speech recognition. In this thesis, I experimentally showed that the proposed deep neural net- work achieves state-of-the-art results on `TIMIT' speech recognition bench- mark dataset. Specifically, the convolutional attention-based sequence-to- sequence model which has the deep stacked convolutional layers in the attention-based seq2seq framework achieved 15.8% phoneme error rate. -
Lisp: Program Is Data
LISP: PROGRAM IS DATA A HISTORICAL PERSPECTIVE ON MACLISP Jon L White Laboratory for Computer Science, M.I.T.* ABSTRACT For over 10 years, MACLISP has supported a variety of projects at M.I.T.'s Artificial Intelligence Laboratory, and the Laboratory for Computer Science (formerly Project MAC). During this time, there has been a continuing development of the MACLISP system, spurred in great measure by the needs of MACSYMAdevelopment. Herein are reported, in amosiac, historical style, the major features of the system. For each feature discussed, an attempt will be made to mention the year of initial development, andthe names of persons or projectsprimarily responsible for requiring, needing, or suggestingsuch features. INTRODUCTION In 1964,Greenblatt and others participated in thecheck-out phase of DigitalEquipment Corporation's new computer, the PDP-6. This machine had a number of innovative features that were thought to be ideal for the development of a list processing system, and thus it was very appropriate that thefirst working program actually run on thePDP-6 was anancestor of thecurrent MACLISP. This earlyLISP was patterned after the existing PDP-1 LISP (see reference l), and was produced by using the text editor and a mini-assembler on the PDP-1. That first PDP-6 finally found its way into M.I.T.'s ProjectMAC for use by theArtificial lntelligence group (the A.1. grouplater became the M.I.T. Artificial Intelligence Laboratory, and Project MAC became the Laboratory for Computer Science). By 1968, the PDP-6 wasrunning the Incompatible Time-sharing system, and was soon supplanted by the PDP-IO.Today, the KL-I 0, anadvanced version of thePDP-10, supports a variety of time sharing systems, most of which are capable of running a MACLISP. -
General Relativity Computations with Sagemanifolds
General relativity computations with SageManifolds Eric´ Gourgoulhon Laboratoire Univers et Th´eories (LUTH) CNRS / Observatoire de Paris / Universit´eParis Diderot 92190 Meudon, France http://luth.obspm.fr/~luthier/gourgoulhon/ NewCompStar School 2016 Neutron stars: gravitational physics theory and observations Coimbra (Portugal) 5-9 September 2016 Eric´ Gourgoulhon (LUTH) GR computations with SageManifolds NewCompStar, Coimbra, 6 Sept. 2016 1 / 29 Outline 1 Computer differential geometry and tensor calculus 2 The SageManifolds project 3 Let us practice! 4 Other examples 5 Conclusion and perspectives Eric´ Gourgoulhon (LUTH) GR computations with SageManifolds NewCompStar, Coimbra, 6 Sept. 2016 2 / 29 Computer differential geometry and tensor calculus Outline 1 Computer differential geometry and tensor calculus 2 The SageManifolds project 3 Let us practice! 4 Other examples 5 Conclusion and perspectives Eric´ Gourgoulhon (LUTH) GR computations with SageManifolds NewCompStar, Coimbra, 6 Sept. 2016 3 / 29 In 1965, J.G. Fletcher developed the GEOM program, to compute the Riemann tensor of a given metric In 1969, during his PhD under Pirani supervision, Ray d'Inverno wrote ALAM (Atlas Lisp Algebraic Manipulator) and used it to compute the Riemann tensor of Bondi metric. The original calculations took Bondi and his collaborators 6 months to go. The computation with ALAM took 4 minutes and yielded to the discovery of 6 errors in the original paper [J.E.F. Skea, Applications of SHEEP (1994)] Since then, many softwares for tensor calculus have been developed... Computer differential geometry and tensor calculus Introduction Computer algebra system (CAS) started to be developed in the 1960's; for instance Macsyma (to become Maxima in 1998) was initiated in 1968 at MIT Eric´ Gourgoulhon (LUTH) GR computations with SageManifolds NewCompStar, Coimbra, 6 Sept.