Backprop As Functor: a Compositional Perspective on Supervised Learning
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Double Categories of Open Dynamical Systems (Extended Abstract)
Double Categories of Open Dynamical Systems (Extended Abstract) David Jaz Myers Johns Hopkins University A (closed) dynamical system is a notion of how things can be, together with a notion of how they may change given how they are. The idea and mathematics of closed dynamical systems has proven incredibly useful in those sciences that can isolate their object of study from its environment. But many changing situations in the world cannot be meaningfully isolated from their environment – a cell will die if it is removed from everything beyond its walls. To study systems that interact with their environment, and to design such systems in a modular way, we need a robust theory of open dynamical systems. In this extended abstract, we put forward a general definition of open dynamical system. We define two general sorts of morphisms between these systems: covariant morphisms which include trajectories, steady states, and periodic orbits; and contravariant morphisms which allow for plugging variables of some systems into parameters of other systems. We define an indexed double category of open dynamical systems indexed by their interface and use a double Grothendieck construction to construct a double category of open dynamical systems. In our main theorem, we construct covariantly representable indexed double functors from the indexed double category of dynamical systems to an indexed double category of spans. This shows that all covariantly representable structures of dynamical systems — including trajectories, steady states, and periodic orbits — compose according to the laws of matrix arithmetic. 1 Open Dynamical Systems The notion of a dynamical system pervades mathematical modeling in the sciences. -
Applied Category Theory for Genomics – an Initiative
Applied Category Theory for Genomics { An Initiative Yanying Wu1,2 1Centre for Neural Circuits and Behaviour, University of Oxford, UK 2Department of Physiology, Anatomy and Genetics, University of Oxford, UK 06 Sept, 2020 Abstract The ultimate secret of all lives on earth is hidden in their genomes { a totality of DNA sequences. We currently know the whole genome sequence of many organisms, while our understanding of the genome architecture on a systematic level remains rudimentary. Applied category theory opens a promising way to integrate the humongous amount of heterogeneous informations in genomics, to advance our knowledge regarding genome organization, and to provide us with a deep and holistic view of our own genomes. In this work we explain why applied category theory carries such a hope, and we move on to show how it could actually do so, albeit in baby steps. The manuscript intends to be readable to both mathematicians and biologists, therefore no prior knowledge is required from either side. arXiv:2009.02822v1 [q-bio.GN] 6 Sep 2020 1 Introduction DNA, the genetic material of all living beings on this planet, holds the secret of life. The complete set of DNA sequences in an organism constitutes its genome { the blueprint and instruction manual of that organism, be it a human or fly [1]. Therefore, genomics, which studies the contents and meaning of genomes, has been standing in the central stage of scientific research since its birth. The twentieth century witnessed three milestones of genomics research [1]. It began with the discovery of Mendel's laws of inheritance [2], sparked a climax in the middle with the reveal of DNA double helix structure [3], and ended with the accomplishment of a first draft of complete human genome sequences [4]. -
Predrnn: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal Lstms
PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs Yunbo Wang Mingsheng Long∗ School of Software School of Software Tsinghua University Tsinghua University [email protected] [email protected] Jianmin Wang Zhifeng Gao Philip S. Yu School of Software School of Software School of Software Tsinghua University Tsinghua University Tsinghua University [email protected] [email protected] [email protected] Abstract The predictive learning of spatiotemporal sequences aims to generate future images by learning from the historical frames, where spatial appearances and temporal vari- ations are two crucial structures. This paper models these structures by presenting a predictive recurrent neural network (PredRNN). This architecture is enlightened by the idea that spatiotemporal predictive learning should memorize both spatial ap- pearances and temporal variations in a unified memory pool. Concretely, memory states are no longer constrained inside each LSTM unit. Instead, they are allowed to zigzag in two directions: across stacked RNN layers vertically and through all RNN states horizontally. The core of this network is a new Spatiotemporal LSTM (ST-LSTM) unit that extracts and memorizes spatial and temporal representations simultaneously. PredRNN achieves the state-of-the-art prediction performance on three video prediction datasets and is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures. 1 Introduction -
A Category-Theoretic Approach to Representation and Analysis of Inconsistency in Graph-Based Viewpoints
A Category-Theoretic Approach to Representation and Analysis of Inconsistency in Graph-Based Viewpoints by Mehrdad Sabetzadeh A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Computer Science University of Toronto Copyright c 2003 by Mehrdad Sabetzadeh Abstract A Category-Theoretic Approach to Representation and Analysis of Inconsistency in Graph-Based Viewpoints Mehrdad Sabetzadeh Master of Science Graduate Department of Computer Science University of Toronto 2003 Eliciting the requirements for a proposed system typically involves different stakeholders with different expertise, responsibilities, and perspectives. This may result in inconsis- tencies between the descriptions provided by stakeholders. Viewpoints-based approaches have been proposed as a way to manage incomplete and inconsistent models gathered from multiple sources. In this thesis, we propose a category-theoretic framework for the analysis of fuzzy viewpoints. Informally, a fuzzy viewpoint is a graph in which the elements of a lattice are used to specify the amount of knowledge available about the details of nodes and edges. By defining an appropriate notion of morphism between fuzzy viewpoints, we construct categories of fuzzy viewpoints and prove that these categories are (finitely) cocomplete. We then show how colimits can be employed to merge the viewpoints and detect the inconsistencies that arise independent of any particular choice of viewpoint semantics. Taking advantage of the same category-theoretic techniques used in defining fuzzy viewpoints, we will also introduce a more general graph-based formalism that may find applications in other contexts. ii To my mother and father with love and gratitude. Acknowledgements First of all, I wish to thank my supervisor Steve Easterbrook for his guidance, support, and patience. -
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Almost Unsupervised Text to Speech and Automatic Speech Recognition Yi Ren * 1 Xu Tan * 2 Tao Qin 2 Sheng Zhao 3 Zhou Zhao 1 Tie-Yan Liu 2 Abstract 1. Introduction Text to speech (TTS) and automatic speech recognition (ASR) are two popular tasks in speech processing and have Text to speech (TTS) and automatic speech recog- attracted a lot of attention in recent years due to advances in nition (ASR) are two dual tasks in speech pro- deep learning. Nowadays, the state-of-the-art TTS and ASR cessing and both achieve impressive performance systems are mostly based on deep neural models and are all thanks to the recent advance in deep learning data-hungry, which brings challenges on many languages and large amount of aligned speech and text data. that are scarce of paired speech and text data. Therefore, However, the lack of aligned data poses a ma- a variety of techniques for low-resource and zero-resource jor practical problem for TTS and ASR on low- ASR and TTS have been proposed recently, including un- resource languages. In this paper, by leveraging supervised ASR (Yeh et al., 2019; Chen et al., 2018a; Liu the dual nature of the two tasks, we propose an et al., 2018; Chen et al., 2018b), low-resource ASR (Chuang- almost unsupervised learning method that only suwanich, 2016; Dalmia et al., 2018; Zhou et al., 2018), TTS leverages few hundreds of paired data and extra with minimal speaker data (Chen et al., 2019; Jia et al., 2018; unpaired data for TTS and ASR. -
Derived Functors and Homological Dimension (Pdf)
DERIVED FUNCTORS AND HOMOLOGICAL DIMENSION George Torres Math 221 Abstract. This paper overviews the basic notions of abelian categories, exact functors, and chain complexes. It will use these concepts to define derived functors, prove their existence, and demon- strate their relationship to homological dimension. I affirm my awareness of the standards of the Harvard College Honor Code. Date: December 15, 2015. 1 2 DERIVED FUNCTORS AND HOMOLOGICAL DIMENSION 1. Abelian Categories and Homology The concept of an abelian category will be necessary for discussing ideas on homological algebra. Loosely speaking, an abelian cagetory is a type of category that behaves like modules (R-mod) or abelian groups (Ab). We must first define a few types of morphisms that such a category must have. Definition 1.1. A morphism f : X ! Y in a category C is a zero morphism if: • for any A 2 C and any g; h : A ! X, fg = fh • for any B 2 C and any g; h : Y ! B, gf = hf We denote a zero morphism as 0XY (or sometimes just 0 if the context is sufficient). Definition 1.2. A morphism f : X ! Y is a monomorphism if it is left cancellative. That is, for all g; h : Z ! X, we have fg = fh ) g = h. An epimorphism is a morphism if it is right cancellative. The zero morphism is a generalization of the zero map on rings, or the identity homomorphism on groups. Monomorphisms and epimorphisms are generalizations of injective and surjective homomorphisms (though these definitions don't always coincide). It can be shown that a morphism is an isomorphism iff it is epic and monic. -
N-Quasi-Abelian Categories Vs N-Tilting Torsion Pairs 3
N-QUASI-ABELIAN CATEGORIES VS N-TILTING TORSION PAIRS WITH AN APPLICATION TO FLOPS OF HIGHER RELATIVE DIMENSION LUISA FIOROT Abstract. It is a well established fact that the notions of quasi-abelian cate- gories and tilting torsion pairs are equivalent. This equivalence fits in a wider picture including tilting pairs of t-structures. Firstly, we extend this picture into a hierarchy of n-quasi-abelian categories and n-tilting torsion classes. We prove that any n-quasi-abelian category E admits a “derived” category D(E) endowed with a n-tilting pair of t-structures such that the respective hearts are derived equivalent. Secondly, we describe the hearts of these t-structures as quotient categories of coherent functors, generalizing Auslander’s Formula. Thirdly, we apply our results to Bridgeland’s theory of perverse coherent sheaves for flop contractions. In Bridgeland’s work, the relative dimension 1 assumption guaranteed that f∗-acyclic coherent sheaves form a 1-tilting torsion class, whose associated heart is derived equivalent to D(Y ). We generalize this theorem to relative dimension 2. Contents Introduction 1 1. 1-tilting torsion classes 3 2. n-Tilting Theorem 7 3. 2-tilting torsion classes 9 4. Effaceable functors 14 5. n-coherent categories 17 6. n-tilting torsion classes for n> 2 18 7. Perverse coherent sheaves 28 8. Comparison between n-abelian and n + 1-quasi-abelian categories 32 Appendix A. Maximal Quillen exact structure 33 Appendix B. Freyd categories and coherent functors 34 Appendix C. t-structures 37 References 39 arXiv:1602.08253v3 [math.RT] 28 Dec 2019 Introduction In [6, 3.3.1] Beilinson, Bernstein and Deligne introduced the notion of a t- structure obtained by tilting the natural one on D(A) (derived category of an abelian category A) with respect to a torsion pair (X , Y). -
Self-Supervised Learning
Self-Supervised Learning Andrew Zisserman Slides from: Carl Doersch, Ishan Misra, Andrew Owens, Carl Vondrick, Richard Zhang The ImageNet Challenge Story … 1000 categories • Training: 1000 images for each category • Testing: 100k images The ImageNet Challenge Story … strong supervision The ImageNet Challenge Story … outcomes Strong supervision: • Features from networks trained on ImageNet can be used for other visual tasks, e.g. detection, segmentation, action recognition, fine grained visual classification • To some extent, any visual task can be solved now by: 1. Construct a large-scale dataset labelled for that task 2. Specify a training loss and neural network architecture 3. Train the network and deploy • Are there alternatives to strong supervision for training? Self-Supervised learning …. Why Self-Supervision? 1. Expense of producing a new dataset for each new task 2. Some areas are supervision-starved, e.g. medical data, where it is hard to obtain annotation 3. Untapped/availability of vast numbers of unlabelled images/videos – Facebook: one billion images uploaded per day – 300 hours of video are uploaded to YouTube every minute 4. How infants may learn … Self-Supervised Learning The Scientist in the Crib: What Early Learning Tells Us About the Mind by Alison Gopnik, Andrew N. Meltzoff and Patricia K. Kuhl The Development of Embodied Cognition: Six Lessons from Babies by Linda Smith and Michael Gasser What is Self-Supervision? • A form of unsupervised learning where the data provides the supervision • In general, withhold some part of the data, and task the network with predicting it • The task defines a proxy loss, and the network is forced to learn what we really care about, e.g. -
Diagrammatics in Categorification and Compositionality
Diagrammatics in Categorification and Compositionality by Dmitry Vagner Department of Mathematics Duke University Date: Approved: Ezra Miller, Supervisor Lenhard Ng Sayan Mukherjee Paul Bendich Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematics in the Graduate School of Duke University 2019 ABSTRACT Diagrammatics in Categorification and Compositionality by Dmitry Vagner Department of Mathematics Duke University Date: Approved: Ezra Miller, Supervisor Lenhard Ng Sayan Mukherjee Paul Bendich An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Mathematics in the Graduate School of Duke University 2019 Copyright c 2019 by Dmitry Vagner All rights reserved Abstract In the present work, I explore the theme of diagrammatics and their capacity to shed insight on two trends|categorification and compositionality|in and around contemporary category theory. The work begins with an introduction of these meta- phenomena in the context of elementary sets and maps. Towards generalizing their study to more complicated domains, we provide a self-contained treatment|from a pedagogically novel perspective that introduces almost all concepts via diagrammatic language|of the categorical machinery with which we may express the broader no- tions found in the sequel. The work then branches into two seemingly unrelated disciplines: dynamical systems and knot theory. In particular, the former research defines what it means to compose dynamical systems in a manner analogous to how one composes simple maps. The latter work concerns the categorification of the slN link invariant. In particular, we use a virtual filtration to give a more diagrammatic reconstruction of Khovanov-Rozansky homology via a smooth TQFT. -
Reinforcement Learning in Supervised Problem Domains
Technische Universität München Fakultät für Informatik Lehrstuhl VI – Echtzeitsysteme und Robotik reinforcement learning in supervised problem domains Thomas F. Rückstieß Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Daniel Cremers Prüfer der Dissertation 1. Univ.-Prof. Dr. Patrick van der Smagt 2. Univ.-Prof. Dr. Hans Jürgen Schmidhuber Die Dissertation wurde am 30. 06. 2015 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 18. 09. 2015 angenommen. Thomas Rückstieß: Reinforcement Learning in Supervised Problem Domains © 2015 email: [email protected] ABSTRACT Despite continuous advances in computing technology, today’s brute for- ce data processing approaches may not provide the necessary advantage to win the race against the ever-growing amount of data that can be wit- nessed over the last decades. In this thesis, we discuss novel methods and algorithms that are capable of directing attention to relevant details and analysing it in sequence to overcome the processing bottleneck and to keep up with this data explosion. In the first of three parts, a novel exploration technique for Policy Gradi- ent Reinforcement Learning is presented which replaces traditional ad- ditive random exploration with state-dependent exploration, exploring on a higher, more strategic level. We will show how this new exploration method converges faster and finds better global solutions than random exploration can. The second part of this thesis will introduce the concept of “data con- sumption” and discuss means to minimise it in supervised learning tasks by deriving classification as a sequential decision process and ma- king it accessible to Reinforcement Learning methods. -
2019 AMS Prize Announcements
FROM THE AMS SECRETARY 2019 Leroy P. Steele Prizes The 2019 Leroy P. Steele Prizes were presented at the 125th Annual Meeting of the AMS in Baltimore, Maryland, in January 2019. The Steele Prizes were awarded to HARUZO HIDA for Seminal Contribution to Research, to PHILIppE FLAJOLET and ROBERT SEDGEWICK for Mathematical Exposition, and to JEFF CHEEGER for Lifetime Achievement. Haruzo Hida Philippe Flajolet Robert Sedgewick Jeff Cheeger Citation for Seminal Contribution to Research: Hamadera (presently, Sakai West-ward), Japan, he received Haruzo Hida an MA (1977) and Doctor of Science (1980) from Kyoto The 2019 Leroy P. Steele Prize for Seminal Contribution to University. He did not have a thesis advisor. He held po- Research is awarded to Haruzo Hida of the University of sitions at Hokkaido University (Japan) from 1977–1987 California, Los Angeles, for his highly original paper “Ga- up to an associate professorship. He visited the Institute for Advanced Study for two years (1979–1981), though he lois representations into GL2(Zp[[X ]]) attached to ordinary cusp forms,” published in 1986 in Inventiones Mathematicae. did not have a doctoral degree in the first year there, and In this paper, Hida made the fundamental discovery the Institut des Hautes Études Scientifiques and Université that ordinary cusp forms occur in p-adic analytic families. de Paris Sud from 1984–1986. Since 1987, he has held a J.-P. Serre had observed this for Eisenstein series, but there full professorship at UCLA (and was promoted to Distin- the situation is completely explicit. The methods and per- guished Professor in 1998). -
Infinite Variational Autoencoder for Semi-Supervised Learning
Infinite Variational Autoencoder for Semi-Supervised Learning M. Ehsan Abbasnejad Anthony Dick Anton van den Hengel The University of Adelaide {ehsan.abbasnejad, anthony.dick, anton.vandenhengel}@adelaide.edu.au Abstract with whatever labelled data is available to train a discrimi- native model for classification. This paper presents an infinite variational autoencoder We demonstrate that our infinite VAE outperforms both (VAE) whose capacity adapts to suit the input data. This the classical VAE and standard classification methods, par- is achieved using a mixture model where the mixing coef- ticularly when the number of available labelled samples is ficients are modeled by a Dirichlet process, allowing us to small. This is because the infinite VAE is able to more ac- integrate over the coefficients when performing inference. curately capture the distribution of the unlabelled data. It Critically, this then allows us to automatically vary the therefore provides a generative model that allows the dis- number of autoencoders in the mixture based on the data. criminative model, which is trained based on its output, to Experiments show the flexibility of our method, particularly be more effectively learnt using a small number of samples. for semi-supervised learning, where only a small number of The main contribution of this paper is twofold: (1) we training samples are available. provide a Bayesian non-parametric model for combining autoencoders, in particular variational autoencoders. This bridges the gap between non-parametric Bayesian meth- 1. Introduction ods and the deep neural networks; (2) we provide a semi- supervised learning approach that utilizes the infinite mix- The Variational Autoencoder (VAE) [18] is a newly in- ture of autoencoders learned by our model for prediction troduced tool for unsupervised learning of a distribution x x with from a small number of labeled examples.