Array Languages Make Neural Networks Fast

Total Page:16

File Type:pdf, Size:1020Kb

Array Languages Make Neural Networks Fast Array Languages Make Neural Networks Fast Artjoms Šinkarovs Hans-Nikolai Vießmann Sven-Bodo Scholz Heriot-Watt University Heriot-Watt University Heriot-Watt University Edinburgh, Scotland, UK Edinburgh, Scotland, UK Edinburgh, Scotland, UK [email protected] [email protected] [email protected] Abstract hardware — modern machine learning applications are envi- Modern machine learning frameworks are complex: they are sioned to run on massively parallel high-throughput systems typically organised in multiple layers each of which is writ- that may be equipped with GPUs, TPUs or even custom-built ten in a different language and they depend on a number of hardware. external libraries, but at their core they mainly consist of ten- Programming such complex systems is very challenging, sor operations. As array-oriented languages provide perfect specifically in an architecture-agnostic way. Therefore, there abstractions to implement tensor operations, we consider a is a big demand for a system that abstracts away architectural minimalistic machine learning framework that is shallowly details letting the users to focus on the machine learning algo- embedded in an array-oriented language and we study its rithms. TensorFlow or PyTorch solve exactly that problem productivity and performance. We do this by implementing — they provide a convenient level of abstraction, offering a a state of the art Convolutional Neural Network (CNN) and number of building blocks that machine learning scientists compare it against implementations in TensorFlow and Py- can use to specify their problems. Productivity of such a Torch — two state of the art industrial-strength frameworks. solution is quite high as these frameworks are embedded It turns out that our implementation is 2 and 3 times faster, into high-level languages such as Python or C++. even after fine-tuning the TensorFlow and PyTorch to our However, turning a framework-based specification into hardware — a 64-core GPU-accelerated machine. The size an efficient code remains challenging. There is a huge se- of all three CNN specifications is the same, about 150 lines mantic gap between the specification and the hardware, yet of code. Our mini framework is 150 lines of highly reusable frameworks such a TensorFlow and PyTorch introduce hardware-agnostic code that does not depend on external li- many levels of abstractions that one needs to deal with. Typ- braries. The compiler for a host array language automatically ically there is a Python front-end, a core library in C++ that generates parallel code for a chosen architecture. The key to depends on numerous external libraries for linear algebra, such a balance between performance and portability lies in tensor operations, libraries for GPUs and other specialised the design of the array language; in particular, the ability to hardware. Such a complexity makes it challenging to deliver express rank-polymorphic operations concisely, yet being excellent performance: optimisations across multiple layers able to do optimisations across them. This design builds on of abstraction as well as across multiple external libraries very few assumptions, and it is readily transferable to other inherently come with overheads. contexts offering a clean approach to high-performance ma- The key question we are investigating is: can we identify chine learning. a single layer of abstraction where on the one hand we can express the core building blocks and generate efficient paral- lel code, and on the other hand that is high-level enough to 1 Introduction be used as a front-end. With the increasing success of machine learning in various Based on the observation that neural networks can be domains, scientists attempt to solve more and more complex concisely expressed as computations on high-ranked tensors, arXiv:1912.05234v1 [cs.PL] 11 Dec 2019 problems using neural networks and deep learning. Increased we look into using a shape-polymorphic array language à la complexity in the context of deep learning typically means APL [22] as the central layer of abstraction. While APL itself more layers of neurons and larger training sets all of which seems suitable in terms of expressiveness [45] the interpreter- results in the necessity to process larger amounts of data. As based implementation of operators, non surprisingly, does a result modern networks require advanced and powerful not readily provide parallel performance anywhere near that of TensorFlow or PyTorch. Authors’ addresses: Artjoms Šinkarovs, Heriot-Watt University, Heriot-Watt However, over the last 50 years we have seen quite some University, Edinburgh, Scotland, EH14 4AS, UK, [email protected]; Hans-Nikolai Vießmann, Heriot-Watt University, Heriot-Watt University, research into compilation of array languages into efficient Edinburgh, Scotland, EH14 4AS, UK, [email protected]; Sven-Bodo Scholz, parallel code [3, 5, 19, 35, 37]. These languages leverage Heriot-Watt University, Heriot-Watt University, Edinburgh, Scotland, EH14 whole program optimisations and they offer decent levels of 4AS, UK, [email protected]. parallel performance. They also offer high program porta- bility, as inputs are typically hardware-agnostic and all the 2020. 2475-1421/2020/1-ART1 $15.00 decisions on optimisations and code generation are taken by https://doi.org/ 1 Conference’17, July 2017, Washington, DC, USA Artjoms Šinkarovs, Hans-Nikolai Vießmann, and Sven-Bodo Scholz the compiler. A user can influence these decisions by passing Y = f X, in the best possible way, according to some cost options, but no code modifications are required. function. After f is found for the existing sample, we would For the purposes of this paper we use SaC [35], a func- like to make new predictions for new inputs. tional array language, as our implementation vehicle. We fo- cus on a simple yet frequently benchmarked CNN for recog- Linear Regression The simplest example of a machine nising handwritten characters. First we implement building learning algorithm is linear regression [11, 24]. It is probably blocks that are required to define a chosen CNN in native one of the most well-understood algorithms in the area, yet SaC and then we use these building blocks to define the it demonstrates fundamental principles that will be also used network. We compare the resulting code size and perfor- in CNNs. Given a set of n statistical units fyi ; xi1;:::; xim g, mance against TensorFlow and PyTorch. We observe that for i 2 f1;:::;ng, we assume that the relationship between the overall problem can be expressed concisely (300 lines ys and xs is linear, so that each yi can be computed as: of native1 SaC code) and on a GPU-accelerated 64-core ma- yi = β0 + β1xi1 + ··· + βmxim + ϵi . This can be written chine, our solution performs two and three times faster than in matrix form as: the TensorFlow- and PyTorch-based implementations. The y = X β + ϵ where key aspect of such good performance is first-class support for multi-dimensional arrays in a functional setting followed by y1 1 x11 ··· x1m β0 ϵ1 a number of well-known code-generation techniques used © : ª ©: : ª © : ª © : ª y = ­ : ® X = ­: : : ® β = ­ : ® ϵ = ­ : ® by the chosen compiler. ­ ® ­ ® ­ ® ­ ® y 1 x ··· x β ϵ This example suggests that at least for this particular do- « n¬ « n1 nm¬ « m¬ « n¬ main, the trade-off between conciseness, performance and There exists a large number of methods to estimate or in- development time is quite satisfying. fer parameters β and ϵ such that our model function “best” The individual contributions of the paper are: fits the data. For example, one commonly used method is • we make a case for using array languages to host a linear least squares [24]. We assume that ϵ = 0 and the cost Ín 2 machine-learning framework, function that we want to minimise is: i=1 ¹yi − yˆi º where • we provide a concise implementation of the CNN for yˆi = X β. The attractiveness of this method lies in existence hand-written image recognition in SaC without using of the closed solution for the parameter vector β given by any domain-specific libraries, and the formula: β = ¹X >Xº−1X >y. • we present a performance evaluation of the CNN in Note two important aspects. First, instead of searching SaC against multiple variants of the PyTorch- and through all the functions from X to Y , we restrict the general TensorFlow-based versions of the algorithm on a shape of that function and introduce a set of parameters high-performance cluster node. (β-s in our case). The search of a function reduces to the The rest of the paper is organised as follows. In Section2 search of the parameters. Secondly, computationally, most we briefly introduce machine learning algorithms and state of the involved operations can be reduced to linear algebra of the art frameworks. In Sections2 and3 we introduce the operations. This means that we will need a representation notion of functional arrays and describe our implementation for vectors, matrices, tensors and common operations on of the CNN. Section4 presents performance and productivity them when implementing machine learning algorithms. evaluation. Section5 reviews related work, and we conclude Continuing on from linear regression, in Section6. Neural Networks we can consider that the function f : x ! Y that we want 2 Background to learn as a composition of functions дi that can be further decomposed into smaller functions. Overall such a compo- In the last decade machine learning attracted a lot of atten- sition forms a graph (or network) connecting inputs X with tion as it offered solutions to several practical problems that outputs Y . mainly have to do with automatic recognition of complex A typical function composition takes the form: f x = patterns: objects in images or videos, automatic text transla- Í A ¹ i wi ¹дi xºº where A is an activation function (usually tion, recommendation systems, speech recognition, etc.
Recommended publications
  • Javascript and the DOM
    Javascript and the DOM 1 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento The web architecture with smart browser The web programmer also writes Programs which run on the browser. Which language? Javascript! HTTP Get + params File System Smart browser Server httpd Cgi-bin Internet Query SQL Client process DB Data Evolution 3: execute code also on client! (How ?) Javascript and the DOM 1- Adding dynamic behaviour to HTML 3 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento Example 1: onmouseover, onmouseout <!DOCTYPE html> <html> <head> <title>Dynamic behaviour</title> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> </head> <body> <div onmouseover="this.style.color = 'red'" onmouseout="this.style.color = 'green'"> I can change my colour!</div> </body> </html> JAVASCRIPT The dynamic behaviour is on the client side! (The file can be loaded locally) <body> <div Example 2: onmouseover, onmouseout onmouseover="this.style.background='orange'; this.style.color = 'blue';" onmouseout=" this.innerText='and my text and position too!'; this.style.position='absolute'; this.style.left='100px’; this.style.top='150px'; this.style.borderStyle='ridge'; this.style.borderColor='blue'; this.style.fontSize='24pt';"> I can change my colour... </div> </body > JavaScript is event-based UiEvents: These event objects iherits the properties of the UiEvent: • The FocusEvent • The InputEvent • The KeyboardEvent • The MouseEvent • The TouchEvent • The WheelEvent See https://www.w3schools.com/jsref/obj_uievent.asp Test and Gym JAVASCRIPT HTML HEAD HTML BODY CSS https://www.jdoodle.com/html-css-javascript-online-editor/ Javascript and the DOM 2- Introduction to the language 8 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento JavaScript History • JavaScript was born as Mocha, then “LiveScript” at the beginning of the 94’s.
    [Show full text]
  • An Array-Oriented Language with Static Rank Polymorphism
    An array-oriented language with static rank polymorphism Justin Slepak, Olin Shivers, and Panagiotis Manolios Northeastern University fjrslepak,shivers,[email protected] Abstract. The array-computational model pioneered by Iverson's lan- guages APL and J offers a simple and expressive solution to the \von Neumann bottleneck." It includes a form of rank, or dimensional, poly- morphism, which renders much of a program's control structure im- plicit by lifting base operators to higher-dimensional array structures. We present the first formal semantics for this model, along with the first static type system that captures the full power of the core language. The formal dynamic semantics of our core language, Remora, illuminates several of the murkier corners of the model. This allows us to resolve some of the model's ad hoc elements in more general, regular ways. Among these, we can generalise the model from SIMD to MIMD computations, by extending the semantics to permit functions to be lifted to higher- dimensional arrays in the same way as their arguments. Our static semantics, a dependent type system of carefully restricted power, is capable of describing array computations whose dimensions cannot be determined statically. The type-checking problem is decidable and the type system is accompanied by the usual soundness theorems. Our type system's principal contribution is that it serves to extract the implicit control structure that provides so much of the language's expres- sive power, making this structure explicitly apparent at compile time. 1 The Promise of Rank Polymorphism Behind every interesting programming language is an interesting model of com- putation.
    [Show full text]
  • R from a Programmer's Perspective Accompanying Manual for an R Course Held by M
    DSMZ R programming course R from a programmer's perspective Accompanying manual for an R course held by M. Göker at the DSMZ, 11/05/2012 & 25/05/2012. Slightly improved version, 10/09/2012. This document is distributed under the CC BY 3.0 license. See http://creativecommons.org/licenses/by/3.0 for details. Introduction The purpose of this course is to cover aspects of R programming that are either unlikely to be covered elsewhere or likely to be surprising for programmers who have worked with other languages. The course thus tries not be comprehensive but sort of complementary to other sources of information. Also, the material needed to by compiled in short time and perhaps suffers from important omissions. For the same reason, potential participants should not expect a fully fleshed out presentation but a combination of a text-only document (this one) with example code comprising the solutions of the exercises. The topics covered include R's general features as a programming language, a recapitulation of R's type system, advanced coding of functions, error handling, the use of attributes in R, object-oriented programming in the S3 system, and constructing R packages (in this order). The expected audience comprises R users whose own code largely consists of self-written functions, as well as programmers who are fluent in other languages and have some experience with R. Interactive users of R without programming experience elsewhere are unlikely to benefit from this course because quite a few programming skills cannot be covered here but have to be presupposed.
    [Show full text]
  • MSD FATFS Users Guide
    Freescale MSD FATFS Users Guide Document Number: MSDFATFSUG Rev. 0 02/2011 How to Reach Us: Home Page: www.freescale.com E-mail: [email protected] USA/Europe or Locations Not Listed: Freescale Semiconductor Technical Information Center, CH370 1300 N. Alma School Road Chandler, Arizona 85224 +1-800-521-6274 or +1-480-768-2130 [email protected] Europe, Middle East, and Africa: Information in this document is provided solely to enable system and Freescale Halbleiter Deutschland GmbH software implementers to use Freescale Semiconductor products. There are Technical Information Center no express or implied copyright licenses granted hereunder to design or Schatzbogen 7 fabricate any integrated circuits or integrated circuits based on the 81829 Muenchen, Germany information in this document. +44 1296 380 456 (English) +46 8 52200080 (English) Freescale Semiconductor reserves the right to make changes without further +49 89 92103 559 (German) notice to any products herein. Freescale Semiconductor makes no warranty, +33 1 69 35 48 48 (French) representation or guarantee regarding the suitability of its products for any particular purpose, nor does Freescale Semiconductor assume any liability [email protected] arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or Japan: incidental damages. “Typical” parameters that may be provided in Freescale Freescale Semiconductor Japan Ltd. Semiconductor data sheets and/or specifications can and do vary in different Headquarters applications and actual performance may vary over time. All operating ARCO Tower 15F parameters, including “Typicals”, must be validated for each customer 1-8-1, Shimo-Meguro, Meguro-ku, application by customer’s technical experts.
    [Show full text]
  • OSLC Architecture Management Specification 3.0 Project Specification Draft 01 17 September 2020
    Standards Track Work Product OSLC Architecture Management Specification 3.0 Project Specification Draft 01 17 September 2020 This stage: https://docs.oasis-open-projects.org/oslc-op/am/v3.0/psd01/architecture-management-spec.html (Authoritative) https://docs.oasis-open-projects.org/oslc-op/am/v3.0/psd01/architecture-management-spec.pdf Previous stage: N/A Latest stage: https://docs.oasis-open-projects.org/oslc-op/am/v3.0/architecture-management-spec.html (Authoritative) https://docs.oasis-open-projects.org/oslc-op/am/v3.0/architecture-management-spec.pdf Latest version: https://open-services.net/spec/am/latest Latest editor's draft: https://open-services.net/spec/am/latest-draft Open Project: OASIS Open Services for Lifecycle Collaboration (OSLC) OP Project Chairs: Jim Amsden ([email protected]), IBM Andrii Berezovskyi ([email protected]), KTH Editor: Jim Amsden ([email protected]), IBM Additional components: This specification is one component of a Work Product that also includes: OSLC Architecture Management Version 3.0. Part 1: Specification (this document). architecture-management- spec.html OSLC Architecture Management Version 3.0. Part 2: Vocabulary. architecture-management-vocab.html OSLC Architecture Management Version 3.0. Part 3: Constraints. architecture-management-shapes.html OSLC Architecture Management Version 3.0. Machine Readable Vocabulary Terms. architecture-management- vocab.ttl OSLC Architecture Management Version 3.0. Machine Readable Constraints. architecture-management-shapes.ttl architecture-management-spec Copyright © OASIS Open 2020. All Rights Reserved. 17 September 2020 - Page 1 of 19 Standards Track Work Product Related work: This specification is related to: OSLC Architecture Management Specification Version 2.0.
    [Show full text]
  • Parallel Functional Programming with APL
    Parallel Functional Programming, APL Lecture 1 Parallel Functional Programming with APL Martin Elsman Department of Computer Science University of Copenhagen DIKU December 19, 2016 Martin Elsman (DIKU) Parallel Functional Programming, APL Lecture 1 December 19, 2016 1 / 18 Outline 1 Outline Course Outline 2 Introduction to APL What is APL APL Implementations and Material APL Scalar Operations APL (1-Dimensional) Vector Computations Declaring APL Functions (dfns) APL Multi-Dimensional Arrays Iterations Declaring APL Operators Function Trains Examples Reading Martin Elsman (DIKU) Parallel Functional Programming, APL Lecture 1 December 19, 2016 2 / 18 Outline Course Outline Teachers Martin Elsman (ME), Ken Friis Larsen (KFL), Andrzej Filinski (AF), and Troels Henriksen (TH) Location Lectures in Small Aud, Universitetsparken 1 (UP1); Labs in Old Library, UP1 Course Description See http://kurser.ku.dk/course/ndak14009u/2016-2017 Course Outline Week 47 48 49 50 51 1–3 Mon 13–15 Intro, Futhark Parallel SNESL APL (ME) Project Futhark (ME) Haskell (AF) (ME) (KFL) Mon 15–17 Lab Lab Lab Lab Project Wed 13–15 Futhark Parallel SNESL Invited APL Project (ME) Haskell (AF) Lecture (ME) / (KFL) (John Projects Reppy) Martin Elsman (DIKU) Parallel Functional Programming, APL Lecture 1 December 19, 2016 3 / 18 Introduction to APL What is APL APL—An Ancient Array Programming Language—But Still Used! Pioneered by Ken E. Iverson in the 1960’s. E. Dijkstra: “APL is a mistake, carried through to perfection.” There are quite a few APL programmers around (e.g., HIPERFIT partners). Very concise notation for expressing array operations. Has a large set of functional, essentially parallel, multi- dimensional, second-order array combinators.
    [Show full text]
  • Programming Language
    Programming language A programming language is a formal language, which comprises a set of instructions that produce various kinds of output. Programming languages are used in computer programming to implement algorithms. Most programming languages consist of instructions for computers. There are programmable machines that use a set of specific instructions, rather than general programming languages. Early ones preceded the invention of the digital computer, the first probably being the automatic flute player described in the 9th century by the brothers Musa in Baghdad, during the Islamic Golden Age.[1] Since the early 1800s, programs have been used to direct the behavior of machines such as Jacquard looms, music boxes and player pianos.[2] The programs for these The source code for a simple computer program written in theC machines (such as a player piano's scrolls) did not programming language. When compiled and run, it will give the output "Hello, world!". produce different behavior in response to different inputs or conditions. Thousands of different programming languages have been created, and more are being created every year. Many programming languages are written in an imperative form (i.e., as a sequence of operations to perform) while other languages use the declarative form (i.e. the desired result is specified, not how to achieve it). The description of a programming language is usually split into the two components ofsyntax (form) and semantics (meaning). Some languages are defined by a specification document (for example, theC programming language is specified by an ISO Standard) while other languages (such as Perl) have a dominant implementation that is treated as a reference.
    [Show full text]
  • RAL-TR-1998-060 Co-Array Fortran for Parallel Programming
    RAL-TR-1998-060 Co-Array Fortran for parallel programming 1 by R. W. Numrich23 and J. K. Reid Abstract Co-Array Fortran, formerly known as F−− , is a small extension of Fortran 95 for parallel processing. A Co-Array Fortran program is interpreted as if it were replicated a number of times and all copies were executed asynchronously. Each copy has its own set of data objects and is termed an image. The array syntax of Fortran 95 is extended with additional trailing subscripts in square brackets to give a clear and straightforward representation of any access to data that is spread across images. References without square brackets are to local data, so code that can run independently is uncluttered. Only where there are square brackets, or where there is a procedure call and the procedure contains square brackets, is communication between images involved. There are intrinsic procedures to synchronize images, return the number of images, and return the index of the current image. We introduce the extension; give examples to illustrate how clear, powerful, and flexible it can be; and provide a technical definition. Categories and subject descriptors: D.3 [PROGRAMMING LANGUAGES]. General Terms: Parallel programming. Additional Key Words and Phrases: Fortran. Department for Computation and Information, Rutherford Appleton Laboratory, Oxon OX11 0QX, UK August 1998. 1 Available by anonymous ftp from matisa.cc.rl.ac.uk in directory pub/reports in the file nrRAL98060.ps.gz 2 Silicon Graphics, Inc., 655 Lone Oak Drive, Eagan, MN 55121, USA. Email:
    [Show full text]
  • Programming the Capabilities of the PC Have Changed Greatly Since the Introduction of Electronic Computers
    1 www.onlineeducation.bharatsevaksamaj.net www.bssskillmission.in INTRODUCTION TO PROGRAMMING LANGUAGE Topic Objective: At the end of this topic the student will be able to understand: History of Computer Programming C++ Definition/Overview: Overview: A personal computer (PC) is any general-purpose computer whose original sales price, size, and capabilities make it useful for individuals, and which is intended to be operated directly by an end user, with no intervening computer operator. Today a PC may be a desktop computer, a laptop computer or a tablet computer. The most common operating systems are Microsoft Windows, Mac OS X and Linux, while the most common microprocessors are x86-compatible CPUs, ARM architecture CPUs and PowerPC CPUs. Software applications for personal computers include word processing, spreadsheets, databases, games, and myriad of personal productivity and special-purpose software. Modern personal computers often have high-speed or dial-up connections to the Internet, allowing access to the World Wide Web and a wide range of other resources. Key Points: 1. History of ComputeWWW.BSSVE.INr Programming The capabilities of the PC have changed greatly since the introduction of electronic computers. By the early 1970s, people in academic or research institutions had the opportunity for single-person use of a computer system in interactive mode for extended durations, although these systems would still have been too expensive to be owned by a single person. The introduction of the microprocessor, a single chip with all the circuitry that formerly occupied large cabinets, led to the proliferation of personal computers after about 1975. Early personal computers - generally called microcomputers - were sold often in Electronic kit form and in limited volumes, and were of interest mostly to hobbyists and technicians.
    [Show full text]
  • Towards the Second Edition of ISO 24707 Common Logic
    Towards the Second Edition of ISO 24707 Common Logic Michael Gr¨uninger(with Tara Athan and Fabian Neuhaus) Miniseries on Ontologies, Rules, and Logic Programming for Reasoning and Applications January 9, 2014 Gr¨uninger ( Ontolog Forum) Common Logic (ISO 24707) January 9, 2014 1 / 20 What Is Common Logic? Common Logic (published as \ISO/IEC 24707:2007 | Information technology Common Logic : a framework for a family of logic-based languages") is a language based on first-order logic, but extending it in several ways that ease the formulation of complex ontologies that are definable in first-order logic. Gr¨uninger ( Ontolog Forum) Common Logic (ISO 24707) January 9, 2014 2 / 20 Semantics An interpretation I consists of a set URI , the universe of reference a set UDI , the universe of discourse, such that I UDI 6= ;; I UDI ⊆ URI ; a mapping intI : V ! URI ; ∗ a mapping relI from URI to subsets of UDI . Gr¨uninger ( Ontolog Forum) Common Logic (ISO 24707) January 9, 2014 3 / 20 How Is Common Logic Being Used? Open Ontology Repositories COLORE (Common Logic Ontology Repository) colore.oor.net stl.mie.utoronto.ca/colore/ontologies.html OntoHub www.ontohub.org https://github.com/ontohub/ontohub Gr¨uninger ( Ontolog Forum) Common Logic (ISO 24707) January 9, 2014 4 / 20 How Is Common Logic Being Used? Ontology-based Standards Process Specification Language (ISO 18629) Date-Time Vocabulary (OMG) Foundational UML (OMG) Semantic Web Services Framework (W3C) OntoIOp (ISO WD 17347) Gr¨uninger ( Ontolog Forum) Common Logic (ISO 24707) January 9, 2014
    [Show full text]
  • Tangent: Automatic Differentiation Using Source-Code Transformation for Dynamically Typed Array Programming
    Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming Bart van Merriënboer Dan Moldovan Alexander B Wiltschko MILA, Google Brain Google Brain Google Brain [email protected] [email protected] [email protected] Abstract The need to efficiently calculate first- and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools. In this work, we explore techniques from the field of automatic differentiation (AD) that can give researchers expressive power, performance and strong usability. These include source-code transformation (SCT), flexible gradient surgery, efficient in-place array operations, and higher-order derivatives. We implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT. 1 Introduction Many applications in machine learning rely on gradient-based optimization, or at least the efficient calculation of derivatives of models expressed as computer programs. Researchers have a wide variety of tools from which they can choose, particularly if they are using the Python language [21, 16, 24, 2, 1]. These tools can generally be characterized as trading off research or production use cases, and can be divided along these lines by whether they implement automatic differentiation using operator overloading (OO) or SCT. SCT affords more opportunities for whole-program optimization, while OO makes it easier to support convenient syntax in Python, like data-dependent control flow, or advanced features such as custom partial derivatives. We show here that it is possible to offer the programming flexibility usually thought to be exclusive to OO-based tools in an SCT framework.
    [Show full text]
  • Compendium of Technical White Papers
    COMPENDIUM OF TECHNICAL WHITE PAPERS Compendium of Technical White Papers from Kx Technical Whitepaper Contents Machine Learning 1. Machine Learning in kdb+: kNN classification and pattern recognition with q ................................ 2 2. An Introduction to Neural Networks with kdb+ .......................................................................... 16 Development Insight 3. Compression in kdb+ ................................................................................................................. 36 4. Kdb+ and Websockets ............................................................................................................... 52 5. C API for kdb+ ............................................................................................................................ 76 6. Efficient Use of Adverbs ........................................................................................................... 112 Optimization Techniques 7. Multi-threading in kdb+: Performance Optimizations and Use Cases ......................................... 134 8. Kdb+ tick Profiling for Throughput Optimization ....................................................................... 152 9. Columnar Database and Query Optimization ............................................................................ 166 Solutions 10. Multi-Partitioned kdb+ Databases: An Equity Options Case Study ............................................. 192 11. Surveillance Technologies to Effectively Monitor Algo and High Frequency Trading ..................
    [Show full text]