Multi-Layer Optimizations for End-To-End Data Analytics

Total Page:16

File Type:pdf, Size:1020Kb

Multi-Layer Optimizations for End-To-End Data Analytics Multi-layer Optimizations for End-to-End Data Analytics Amir Shaikhha Maximilian Schleich University of Oxford University of Oxford United Kingdom United Kingdom Alexandru Ghita Dan Olteanu University of Oxford University of Oxford United Kingdom United Kingdom Abstract are clear advantages to this approach: it is easy to use, al- We consider the problem of training machine learning mod- lows for quick prototyping, and does not require substantial els over multi-relational data. The mainstream approach is programming knowledge. Although it uses libraries that pro- to first construct the training dataset using a feature extrac- vide efficient implementations for their functions (usually by tion query over input database and then use a statistical binding to efficient low-level C code), they miss optimization software package of choice to train the model. In this pa- opportunities for the end-to-end relational learning pipeline. per we introduce Iterative Functional Aggregate Queries In particular, they fail to exploit the relational structure of the (IFAQ), a framework that realizes an alternative approach. underlying data, which was removed by the materialization IFAQ treats the feature extraction query and the learning of the training dataset. The effect is that the efficiency on task as one program given in the IFAQ’s domain-specific lan- one machine is often severely limited, with common deploy- guage, which captures a subset of Python commonly used in ments having to rely on expensive and energy-consuming Jupyter notebooks for rapid prototyping of machine learn- clusters of machines to perform the learning task. ing applications. The program is subject to several layers The thesis of this paper is that this performance limitation of IFAQ optimizations, such as algebraic transformations, can be overcome by systematically optimizing the end-to- loop transformations, schema specialization, data layout op- end relational learning pipeline as a whole. We introduce timizations, and finally compilation into efficient low-level Iterative Functional Aggregate Queries, or IFAQ for short, C++ code specialized for the given workload and data. a framework that is designed to uniformly process and au- We show that a Scala implementation of IFAQ can out- tomatically optimize the various phases of the relational perform mlpack, Scikit, and TensorFlow by several orders of learning pipeline. IFAQ takes as input a program that fully magnitude for linear regression and regression tree models specifies both the data matrix construction and the model over several relational datasets. training in a dynamically-typed language called D-IFAQ. This language captures a fragment of scripting languages CCS Concepts • Computing methodologies → Super- such as Python that is used for rapid prototyping of machine vised learning by regression; • Information systems → learning models. This is possible thanks to the iterative and Database management system engines; • Software and its collection-oriented constructs provided by D-IFAQ. engineering → Domain specific languages. An IFAQ program is optimized by multiple compilation layers, whose optimizations are inspired by techniques devel- Keywords In-Database Machine Learning, Multi-Query Op- oped by the data management, programming languages, and arXiv:2001.03541v1 [cs.PL] 10 Jan 2020 timization, Query Compilation. high-performance computing communities. IFAQ performs automatic memoization, which identifies code fragments 1 Introduction that can be expressed as batches of aggregate queries. This optimization further enables non-trivial loop-invariant code The mainstream approach in supervised machine learning motion opportunities. The program is then compiled down over relational data is to specify the construction of the data to a statically-typed language, called S-IFAQ, which benefits matrix followed by the learning task in scripting languages from further optimization stages, including loop transforma- such as Python, MATLAB, Julia, or R using software envi- tions such as loop fusion and code motion. IFAQ investigates ronments such as Jupyter notebook. These environments optimization opportunities at the interface between the data call libraries for query processing, e.g., Pandas [34] or Spark- matrix construction and learning steps and interleaves the SQL [4], or database systems, e.g., PostegreSQL [37]. The code for both steps by pushing the data-intensive compu- materialized training dataset then becomes the input to a tation from the second step past the joins of the first step, statistical package, e.g., mlpack [14], scikit-learn [41], Ten- inspired by recent query processing techniques [2, 38]. The sorFlow [1], and PyTorch [40]) that learns the model. There Relational Learning D-IFAQ S-IFAQ C/C++ p ::= e j x e while(e){ x e} x Task e ::= e + e j e ∗ e j −e j uop¹eº j e bop e j c −−−−! −! IFAQ j Σ e j e j {{e ! e}} j [[ e ]] j dom(e) j e¹eº Figure 1. A relational learning task is expressed in D-IFAQ, x 2e λx 2e −−−−! transformed into an optimized S-IFAQ expression, and com- j {x = e} j <x = e> j e:x j e[e] piled to efficient C++ code. j let x = e in e j x j if e then e else e outcome is a highly optimized code with no separation be- c ::= ‘id‘ j "id" j n j r j false j true tween query processing and machine learning. Finally, IFAQ −−−! −−−! T ::= S j {x : T } j <x : T > j Map[T , T ] j Set[T ] compiles the optimized program into low-level C++ code that further exploits data-layout optimizations. IFAQ can S ::= B j C outperform equivalent solutions by orders of magnitude. C ::= string j enum j bool j Field The contributions of this paper are as follows: n B ::= Z j R j RC • Section2 introduces the IFAQ framework, which comprises Figure 2. Grammar of IFAQ core language. languages and compiler optimizations that are designed th for efficient end-to-end relational learning pipelines. T1, only the i value is 1 and the rest are zero. However, a • As proof of concept, Section3 demonstrates IFAQ for two value of type denoting the one-hot encoding of Rn , can take T1 popular models: linear regression and regression tree. arbitrary real numbers at each position. • Section4 shows how to systematically optimize an IFAQ The second category consists of record types, which are program in several stages. similar to structs in C; the record values contain various • Section5 benchmarks IFAQ, mlpack, TensorFlow, and scikit- fields of possibly different data types. The partial records, learn. It shows that IFAQ can train linear regression and where some fields have no values are referred toas variants. regression tree models over two real datasets orders-of- The final category consists of collection data types such magnitude faster than its competitors. In particular, IFAQ as (ordered) sets and dictionaries. Database relations are learns the models faster than it takes the competitors to represented as dictionaries (mapping elements to their mul- materialize the training dataset. Section5 further shows tiplicities). In D-IFAQ, the elements of collections can have the performance impact of individual optimization layers. different types. However, in S-IFAQ the elements of collec- tions should have the same data type. In order to distinguish 2 Overview the variables with collection data type with other types of variables, we denote them as x and x, respectively. Figure1 depicts the IFAQ workflow. Data scientists use soft- The top-level program consists of several initialization ex- ware environments such as Jupyter Notebook to specify rela- pressions, followed by a loop. This enables IFAQ to express tional learning programs. In our setting, such programs are iterative algorithms such as gradient descent optimization written in a dynamically-typed language called D-IFAQ and algorithms. The rest of the constructs of this language are subject to high-level optimizations (Section 4.1). Given the as follows: (i) various binary and unary operations on both schema information of the dataset, the optimized program scalar and collection data structures,1 (ii) let binding for is translated into a statically-typed language called S-IFAQ assigning the value of an expression to a variable, (iii) con- (Section 4.2). If there are type errors, they are reported to ditional expressions (iv) Í e : the summation operator the user. IFAQ performs several optimizations inspired by x 2e1 2 in order to iterate over the elements of a collection (e ) and database query processing techniques (Section 4.3). Finally, 1 perform a stateful computation using a particular addition it synthesizes appropriate data layouts, resulting in efficient operator (e.g., numerical addition, set union, bag union),2 low-level C/C++ code (Section 4.4). (v) λx 2e1 e2: constructing a dictionary where the key domain 2.1 IFAQ Core Language is e1 and the value for each key x is constructed using e2, (vi) constructing sets ([[e]]) and dictionaries ({{e !e}}) The grammar of the IFAQ core language is given in Figure2. given a list of elements, (vii) dom(e): getting the domain This functional language support the following data types. of a dictionary, (viii) e0(e1): retrieving the associatiated value The first category consists of numeric types as well as cate- to the given key in a dictionary, (ix) constructing records −−−−! gorical types (e.g., boolean values, string values, and other ({x = e}) and variants (<x = e>), (x) statically (e.f) and dy- custom enum types). Furthermore, for categorical types the namically (e[f]) assessing the field of a record or a variant. other alternative is to one-hot encode them, the type of which n is represented as R . This type represents an array of n real 1 T1 More specifically ring-based operations. numbers each one corresponding to a particular value in 2This operator could be every monoid-based operator. Thus, even computing the domain of T1. In the one-hot encoding of a value of type the minimum of two numbers is an addition operator.
Recommended publications
  • Mlpack: Or, How I Learned to Stop Worrying and Love C++ Ryan R
    mlpack: or, How I Learned To Stop Worrying and Love C++ Ryan R. Curtin March 21, 2018 speed programming languages machine learning C++ mlpack fast Introduction Why are we here? programming languages machine learning C++ mlpack fast Introduction Why are we here? speed machine learning C++ mlpack fast Introduction Why are we here? speed programming languages C++ mlpack fast Introduction Why are we here? speed programming languages machine learning mlpack fast Introduction Why are we here? speed programming languages machine learning C++ fast Introduction Why are we here? speed programming languages machine learning C++ mlpack Introduction Why are we here? speed programming languages machine learning C++ mlpack fast Java JS INTERCAL Scala Python C++ C Go Ruby C# Tcl ASM PHP Lua VB low-level high-level fast easy specific portable Note: this is not a scientific or particularly accurate representation. Graph #1 Java JS INTERCAL Scala Python C++ C Go Ruby C# Tcl ASM PHP Lua VB fast easy specific portable Note: this is not a scientific or particularly accurate representation. Graph #1 low-level high-level Java JS INTERCAL Scala Python C++ C Go Ruby C# Tcl PHP Lua VB fast easy specific portable Note: this is not a scientific or particularly accurate representation. Graph #1 ASM low-level high-level Java JS INTERCAL Scala Python C++ C Go Ruby C# Tcl PHP Lua VB fast easy specific portable Note: this is not a scientific or particularly accurate representation. Graph #1 ASM low-level high-level Java JS INTERCAL Scala Python C++ C Go Ruby C# Tcl PHP Lua fast easy specific portable Note: this is not a scientific or particularly accurate representation.
    [Show full text]
  • Data Mining – Intro
    Data warehouse& Data Mining UNIT-3 Syllabus • UNIT 3 • Classification: Introduction, decision tree, tree induction algorithm – split algorithm based on information theory, split algorithm based on Gini index; naïve Bayes method; estimating predictive accuracy of classification method; classification software, software for association rule mining; case study; KDD Insurance Risk Assessment What is Data Mining? • Data Mining is: (1) The efficient discovery of previously unknown, valid, potentially useful, understandable patterns in large datasets (2) Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD (2) The analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner Knowledge Discovery Examples of Large Datasets • Government: IRS, NGA, … • Large corporations • WALMART: 20M transactions per day • MOBIL: 100 TB geological databases • AT&T 300 M calls per day • Credit card companies • Scientific • NASA, EOS project: 50 GB per hour • Environmental datasets KDD The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages: (1) Selection (2) Pre-processing (3) Transformation (4) Data Mining (5) Interpretation/Evaluation Data Mining Methods 1. Decision Tree Classifiers: Used for modeling, classification 2. Association Rules: Used to find associations between sets of attributes 3. Sequential patterns: Used to find temporal associations in time series 4. Hierarchical
    [Show full text]
  • Annotated Table of Contents Links Join SIGAI
    AI MATTERS, VOLUME 4, ISSUE 4 4(4) 2018 AI Matters Annotated Table of Contents AI Ethics Education Welcome to AI Matters 4(4) AI Policy Matters Amy McGovern, co-editor & Iolanda Larry Medsker Leite, co-editor Full article: http://doi.acm.org/10.1145/3299758.3299765 Full article: http://doi.acm.org/10.1145/3299758.3299759 Welcome and summary TAI Policy Matters Call for Proposals: Artificial Intelli- Epochs of an AI Cosmology gence Activities Fund Cameron Hughes & Tracey Hughes Sven Koenig, Sanmay Das, Rosemary Full article: http://doi.acm.org/10.1145/3299758.3299868 Paradis, Michael Rovatsos & Nicholas AI Cosmology Mattei Full article: http://doi.acm.org/10.1145/3299758.3299760 Artificial Intelligence and Jobs of the Artificial Intelligence Activities Fund Future: Adaptability Is Key for Hu- man Evolution AI Profiles: An Interview with Iolanda Sudarshan Sreeram Leite Full article: http://doi.acm.org/10.1145/3299758.3300060 Marion Neumann AI and Jobs of the Future Full article: http://doi.acm.org/10.1145/3299758.3299761 Interview with Iolanda Leite Links Events SIGAI website: http://sigai.acm.org/ Michael Rovatsos Newsletter: http://sigai.acm.org/aimatters/ Blog: http://sigai.acm.org/ai-matters/ Full article: http://doi.acm.org/10.1145/3299758.3299762 Twitter: http://twitter.com/acm sigai/ Upcoming AI events Edition DOI: 10.1145/3284751 Conference Reports Join SIGAI Michael Rovatsos Full article: http://doi.acm.org/10.1145/3299758.3299763 Students $11, others $25 For details, see http://sigai.acm.org/ Conference Reports Benefits: regular, student Also consider joining ACM. AI Education Matters: A Modular Ap- proach to AI Ethics Education Our mailing list is open to all.
    [Show full text]
  • Rcppmlpack: R Integration with MLPACK Using Rcpp
    RcppMLPACK: R integration with MLPACK using Rcpp Qiang Kou [email protected] April 20, 2020 1 MLPACK MLPACK[1] is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. It outperforms competing machine learning libraries by large margins, for detailed results of benchmarking, please refer to Ref[1]. 1.1 Input/Output using Armadillo MLPACK uses Armadillo as input and output, which provides vector, matrix and cube types (integer, floating point and complex numbers are supported)[2]. By providing Matlab-like syntax and various matrix decompositions through integration with LAPACK or other high performance libraries (such as Intel MKL, AMD ACML, or OpenBLAS), Armadillo keeps a good balance between speed and ease of use. Armadillo is widely used in C++ libraries like MLPACK. However, there is one point we need to pay attention to: Armadillo matrices in MLPACK are stored in a column-major format for speed. That means observations are stored as columns and dimensions as rows. So when using MLPACK, additional transpose may be needed. 1.2 Simple API Although MLPACK relies on template techniques in C++, it provides an intuitive and simple API. For the standard usage of methods, default arguments are provided. The MLPACK paper[1] provides a good example: standard k-means clustering in Euclidean space can be initialized like below: 1 KMeans<> k(); Listing 1: k-means with all default arguments If we want to use Manhattan distance, a different cluster initialization policy, and allow empty clusters, the object can be initialized with additional arguments: 1 KMeans<ManhattanDistance, KMeansPlusPlusInitialization, AllowEmptyClusters> k(); Listing 2: k-means with additional arguments 1 1.3 Available methods in MLPACK Commonly used machine learning methods are all implemented in MLPACK.
    [Show full text]
  • [Front Matter]
    Future Technologies Conference (FTC) 2017 29-30 November 2017| Vancouver, Canada About the Conference IEEE Technically Sponsored Future Technologies Conference (FTC) 2017 is a second research conference in the series. This conference is a part of SAI conferences being held since 2013. The conference series has featured keynote talks, special sessions, poster presentation, tutorials, workshops, and contributed papers each year. The goal of the conference is to be a world's pre-eminent forum for reporting technological breakthroughs in the areas of Computing, Electronics, AI, Robotics, Security and Communications. FTC 2017 is held at Pan Pacific Hotel Vancouver. The Pan Pacific luxury Vancouver hotel in British Columbia, Canada is situated on the downtown waterfront of this vibrant metropolis, with some of the city’s top business venues and tourist attractions including Flyover Canada and Gastown, the Vancouver Convention Centre, Cruise Ship Terminal as well as popular shopping and entertainment districts just minutes away. Pan Pacific Hotel Vancouver has great meeting rooms with new designs. We chose it for the conference because it’s got plenty of space, serves great food and is 100% accessible. Venue Name: Pan Pacific Hotel Vancouver Address: Suite 300-999 Canada Place, Vancouver, British Columbia V6C 3B5, Canada Tel: +1 604-662-8111 3 | P a g e Future Technologies Conference (FTC) 2017 29-30 November 2017| Vancouver, Canada Preface FTC 2017 is a recognized event and provides a valuable platform for individuals to present their research findings, display their work in progress and discuss conceptual advances in areas of computer science and engineering, Electrical Engineering and IT related disciplines.
    [Show full text]
  • Deep Learning and Big Data in Healthcare: a Double Review for Critical Beginners
    applied sciences Review Deep Learning and Big Data in Healthcare: A Double Review for Critical Beginners Luis Bote-Curiel 1 , Sergio Muñoz-Romero 1,2,* , Alicia Gerrero-Curieses 1 and José Luis Rojo-Álvarez 1,2 1 Department of Signal Theory and Communications, Rey Juan Carlos University, 28943 Fuenlabrada, Spain; [email protected] (L.B.-C.); [email protected] (A.G.-C.); [email protected] (J.L.R.-Á.) 2 Center for Computational Simulation, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Spain * Correspondence: [email protected] Received: 14 April 2019; Accepted: 3 June 2019; Published: 6 June 2019 Abstract: In the last few years, there has been a growing expectation created about the analysis of large amounts of data often available in organizations, which has been both scrutinized by the academic world and successfully exploited by industry. Nowadays, two of the most common terms heard in scientific circles are Big Data and Deep Learning. In this double review, we aim to shed some light on the current state of these different, yet somehow related branches of Data Science, in order to understand the current state and future evolution within the healthcare area. We start by giving a simple description of the technical elements of Big Data technologies, as well as an overview of the elements of Deep Learning techniques, according to their usual description in scientific literature. Then, we pay attention to the application fields that can be said to have delivered relevant real-world success stories, with emphasis on examples from large technology companies and financial institutions, among others.
    [Show full text]
  • Survey on Recent Machine Learning Tools, Platforms and Interface
    INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTING SCIENCE ISSN NO: 0972-1347 Survey on Recent Machine learning Tools, Platforms and Interface 1.Mr.S.Karthi, M.E,AP/IT, St.Joseph College of Engineering, Chennai, [email protected] 2.Mrs.R.Raja Ramya, M.E.(Ph.D),AP/IT, Saveetha Engineering College, Chennai, ramyanitha @ gmail.com 3.Mrs.N.Kanagavalli, M.E.(Ph.D),AP/IT, Saveetha Engineering College, Chennai, [email protected] ABSTRACT- Machine learning and deep in the process of working through a machine learning are recent booming research areas. Tools are learning problem a big part of machine learning and choosing the right tool can be as important as working with the Why Use Tools? best algorithms. So that the new researchers are not aware of the tools and frameworks are helped to Machine learning tools make applied machine machine learning. There are lots of open source tools learning faster, easier and more fun. and frameworks are available in machine learning and deep learning. In this paper, various open source Faster: This means that the time from ideas to machine learning tools and frameworks are explained. results is greatly shortened. The alternative is that you have to implement each capability Keywords-open source tools, frameworks, platforms yourself. Introduction Easier: The open source tools are very easier to learn by individual. Based on the input, we A programming tool or software development tool is form the output in an easiest way. a computer program that software developers use to create, debug, maintain, or otherwise support other This requires research, deeper exercise in order programs and applications.
    [Show full text]
  • Arxiv:1210.6293V1 [Cs.MS] 23 Oct 2012 So the Finally, Accessible
    Journal of Machine Learning Research 1 (2012) 1-4 Submitted 9/12; Published x/12 MLPACK: A Scalable C++ Machine Learning Library Ryan R. Curtin [email protected] James R. Cline [email protected] N. P. Slagle [email protected] William B. March [email protected] Parikshit Ram [email protected] Nishant A. Mehta [email protected] Alexander G. Gray [email protected] College of Computing Georgia Institute of Technology Atlanta, GA 30332 Editor: Abstract MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library re- leased in late 2011 offering both a simple, consistent API accessible to novice users and high performance and flexibility to expert users by leveraging modern features of C++. ML- PACK provides cutting-edge algorithms whose benchmarks exhibit far better performance than other leading machine learning libraries. MLPACK version 1.0.3, licensed under the LGPL, is available at http://www.mlpack.org. 1. Introduction and Goals Though several machine learning libraries are freely available online, few, if any, offer efficient algorithms to the average user. For instance, the popular Weka toolkit (Hall et al., 2009) emphasizes ease of use but scales poorly; the distributed Apache Mahout library offers scal- ability at a cost of higher overhead (such as clusters and powerful servers often unavailable to the average user). Also, few libraries offer breadth; for instance, libsvm (Chang and Lin, 2011) and the Tilburg Memory-Based Learner (TiMBL) are highly scalable and accessible arXiv:1210.6293v1 [cs.MS] 23 Oct 2012 yet each offer only a single method.
    [Show full text]
  • Annual Report
    ANNUAL REPORT 2019FISCAL YEAR ACM, the Association for Computing Machinery, is an international scientific and educational organization dedicated to advancing the arts, sciences, and applications of information technology. Letter from the President It’s been quite an eventful year and challenges posed by evolving technology. for ACM. While this annual Education has always been at the foundation of exercise allows us a moment ACM, as reflected in two recent curriculum efforts. First, “ACM’s mission to celebrate some of the many the ACM Task Force on Data Science issued “Comput- hinges on successes and achievements ing Competencies for Undergraduate Data Science Cur- creating a the Association has realized ricula.” The guidelines lay out the computing-specific over the past year, it is also an competencies that should be included when other community that opportunity to focus on new academic departments offer programs in data science encompasses and innovative ways to ensure at the undergraduate level. Second, building on the all who work in ACM remains a vibrant global success of our recent guidelines for 4-year cybersecu- the computing resource for the computing community. rity curricula, the ACM Committee for Computing Edu- ACM’s mission hinges on creating a community cation in Community Colleges created a related cur- and technology that encompasses all who work in the computing and riculum targeted at two-year programs, “Cybersecurity arena” technology arena. This year, ACM established a new Di- Curricular Guidance for Associate-Degree Programs.” versity and Inclusion Council to identify ways to create The following pages offer a sampling of the many environments that are welcoming to new perspectives ACM events and accomplishments that occurred over and will attract an even broader membership from the past fiscal year, none of which would have been around the world.
    [Show full text]
  • ML Cheatsheet Documentation
    ML Cheatsheet Documentation Team Oct 02, 2018 Basics 1 Linear Regression 3 2 Gradient Descent 19 3 Logistic Regression 23 4 Glossary 37 5 Calculus 41 6 Linear Algebra 53 7 Probability 63 8 Statistics 65 9 Notation 67 10 Concepts 71 11 Forwardpropagation 77 12 Backpropagation 87 13 Activation Functions 93 14 Layers 101 15 Loss Functions 105 16 Optimizers 109 17 Regularization 113 18 Architectures 115 19 Classification Algorithms 125 20 Clustering Algorithms 127 i 21 Regression Algorithms 129 22 Reinforcement Learning 131 23 Datasets 133 24 Libraries 149 25 Papers 179 26 Other Content 185 27 Contribute 191 ii ML Cheatsheet Documentation Brief visual explanations of machine learning concepts with diagrams, code examples and links to resources for learning more. Warning: This document is under early stage development. If you find errors, please raise an issue or contribute a better definition! Basics 1 ML Cheatsheet Documentation 2 Basics CHAPTER 1 Linear Regression • Introduction • Simple regression – Making predictions – Cost function – Gradient descent – Training – Model evaluation – Summary • Multivariable regression – Growing complexity – Normalization – Making predictions – Initialize weights – Cost function – Gradient descent – Simplifying with matrices – Bias term – Model evaluation 3 ML Cheatsheet Documentation 1.1 Introduction Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). There are two main types: Simple regression Simple linear regression uses traditional slope-intercept form, where m and b are the variables our algorithm will try to “learn” to produce the most accurate predictions.
    [Show full text]
  • SIGLOG: Special Interest Group on Logic and Computation a Proposal
    SIGLOG: Special Interest Group on Logic and Computation A Proposal Natarajan Shankar Computer Science Laboratory SRI International Menlo Park, CA Mar 21, 2014 SIGLOG: Executive Summary Logic is, and will continue to be, a central topic in computing. ACM has a core constituency with an interest in Logic and Computation (L&C), witnessed by 1 The many Turing Awards for work centrally in L&C 2 The ACM journal Transactions on Computational Logic 3 Several long-running conferences like LICS, CADE, CAV, ICLP, RTA, CSL, TACAS, and MFPS, and super-conferences like FLoC and ETAPS SIGLOG explores the connections between logic and computing covering theory, semantics, analysis, and synthesis. SIGLOG delivers value to its membership through the coordination of conferences, journals, newsletters, awards, and educational programs. SIGLOG enjoys significant synergies with several existing SIGs. Natarajan Shankar SIGLOG 2/9 Logic and Computation: Early Foundations Logicians like Alonzo Church, Kurt G¨odel, Alan Turing, John von Neumann, and Stephen Kleene have played a pioneering role in laying the foundation of computing. In the last 65 years, logic has become the calculus of computing underpinning the foundations of many diverse sub-fields. Natarajan Shankar SIGLOG 3/9 Logic and Computation: Turing Awardees Turing awardees for logic-related work include John McCarthy, Edsger Dijkstra, Dana Scott Michael Rabin, Tony Hoare Steve Cook, Robin Milner Amir Pnueli, Ed Clarke Allen Emerson Joseph Sifakis Leslie Lamport Natarajan Shankar SIGLOG 4/9 Interaction between SIGLOG and other SIGs Logic interacts in a significant way with AI (SIGAI), theory of computation (SIGACT), databases (SIGMOD) and knowledge bases (SIGKDD), programming languages (SIGPLAN), software engineering (SIGSOFT), computational biology (SIGBio), symbolic computing (SIGSAM), semantic web (SIGWEB), and hardware design (SIGDA and SIGARCH).
    [Show full text]
  • ML Cheatsheet Documentation
    ML Cheatsheet Documentation Team Sep 02, 2021 Basics 1 Linear Regression 3 2 Gradient Descent 21 3 Logistic Regression 25 4 Glossary 39 5 Calculus 45 6 Linear Algebra 57 7 Probability 67 8 Statistics 69 9 Notation 71 10 Concepts 75 11 Forwardpropagation 81 12 Backpropagation 91 13 Activation Functions 97 14 Layers 105 15 Loss Functions 117 16 Optimizers 121 17 Regularization 127 18 Architectures 137 19 Classification Algorithms 151 20 Clustering Algorithms 157 i 21 Regression Algorithms 159 22 Reinforcement Learning 161 23 Datasets 165 24 Libraries 181 25 Papers 211 26 Other Content 217 27 Contribute 223 ii ML Cheatsheet Documentation Brief visual explanations of machine learning concepts with diagrams, code examples and links to resources for learning more. Warning: This document is under early stage development. If you find errors, please raise an issue or contribute a better definition! Basics 1 ML Cheatsheet Documentation 2 Basics CHAPTER 1 Linear Regression • Introduction • Simple regression – Making predictions – Cost function – Gradient descent – Training – Model evaluation – Summary • Multivariable regression – Growing complexity – Normalization – Making predictions – Initialize weights – Cost function – Gradient descent – Simplifying with matrices – Bias term – Model evaluation 3 ML Cheatsheet Documentation 1.1 Introduction Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). There are two main types: Simple regression Simple linear regression uses traditional slope-intercept form, where m and b are the variables our algorithm will try to “learn” to produce the most accurate predictions.
    [Show full text]