Comparing the Costs of Abstraction for DL Frameworks

Comparing the costs of abstraction for DL frameworks Maksim Levental∗ Elena Orlova [email protected] [email protected] University of Chicago University of Chicago ABSTRACT Trading off ergonomics for performance is manifestly reason- 2 High level abstractions for implementing, training, and testing Deep able , especially during the early phases of the DL engineering/re- Learning (DL) models abound. Such frameworks function primarily search process (i.e. during the hypothesis generation and experi- by abstracting away the implementation details of arbitrary neu- mentation phases). Ultimately one needs to put the DL model into ral architectures, thereby enabling researchers and engineers to production. It is at this phase of the DL engineering process that ev- focus on design. In principle, such frameworks could be “zero-cost ery percentage point of execution performance becomes critical. Al- abstractions”; in practice, they incur translation and indirection ternatively, there are many areas of academic DL where the research overheads. We study at which points exactly in the engineering community strives to incrementally improve performance [6–8]. life-cycle of a DL model the highest costs are paid and whether For example, in the area of super-resolution a high-priority goal is they can be mitigated. We train, test, and evaluate a representative to be able to “super-resolve” in real-time [9]. Similarly, in natural DL model using PyTorch, LibTorch, TorchScript, and cuDNN on language processing, where enormous language models are becom- representative datasets, comparing accuracy, execution time and ing the norm [10], memory efficiency of DL models is of the utmost memory efficiency. concern. In such instances it is natural to wonder whether ease- of-use trade-offs that sacrifice execution performance, or memory ACM Reference Format: efficiency, are worthwhile and whether their costs can be mitigated. Maksim Levental and Elena Orlova. 2020. Comparing the costs of ab- Thus, our intent here is to investigate the costs of some of straction for DL frameworks. In Proceedings of ACM Conference (Con- ference’17). ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/ the abstractions employed by framework developers. In particu- nnnnnnn.nnnnnnn lar we focus on the PyTorch ecosystem (chosen for its popularity amongst academic researchers) deployed to Graphics Processing 1 INTRODUCTION Units (GPUs). To that end, we implement a popular and fairly rep- resentative3 DL model at four levels of abstraction: conventional Deep Learning (DL) frameworks represent neural network mod- PyTorch, LibTorch, cuDNN, and TorchScript. We argue, in the forth- els as dataflow and computation graphs (where nodes correspond coming, that these four implementations span considerable breadth to functional units and edges correspond to composition). In re- in the abstraction spectrum. Furthermore we train, test, and evalu- cent years, there has been a proliferation of DL frameworks [1–4] ate each of the implementations on four object detection datasets implemented as domain-specific languages (DSLs) embedded in and tabulate performance and accuracy metrics. “high-level” languages1 such as Python, Java, and C#. These DSLs The rest of this article is organized as follows: section (2) dis- serve as abstractions that aim to map the DL graphs to hardware cusses abstraction and quickly reviews the germaine background pipelines. That is to say, they hide (or encapsulate) details of DL material on GPUs and DL frameworks, section (3) describes the im- models that are judged to be either irrelevant or too onerous to plementations and our profiling methodology, section (4) presents consider (see section (2.1) for a more comprehensive discussion on our results and a comparative discussion thereof, section (5) dis- abstraction in computer science). By virtue of these design decisions cusses broad lessons learned, section (6) concludes and proposes the frameworks trade-off ease-of-use for execution performance; future work, and section (7) speculates wildly about the future of quoting the architects of PyTorch: DL systems more generally. To be useful, PyTorch needs to deliver compelling performance, although not at the expense of simplicity 2 BACKGROUND and ease of use. Trading 10% of speed for a signifi- arXiv:2012.07163v1 [cs.LG] 13 Dec 2020 cantly simpler to use model is acceptable; 100% is not. 2.1 Abstraction [1] What is abstraction? In fact, there are several closely related notions of abstraction. First there is the philosophical notion of abstraction; ∗Both authors contributed equally to this work. 1For the purposes of this article, we take “high-level” to mean garbage collected and Locke defines abstraction as follows (bolding ours): agnostic with respect to hardware from the perspective of the user. The acts of the mind, wherein it exerts its power over simple ideas, are chiefly these three: ...The third Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed is separating them from all other ideas that ac- for profit or commercial advantage and that copies bear this notice and the full citation company them in their real existence: this is called on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, abstraction ... [11] to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 2“The real problem is that programmers have spent far too much time worrying about Conference’17, July 2017, Washington, DC, USA efficiency in the wrong places and at the wrong times; premature optimization isthe © 2020 Association for Computing Machinery. root of all evil (or at least most of it) in programming.” [5] ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 3In the sense that the functional units constituting the model are widely used in various https://doi.org/10.1145/nnnnnnn.nnnnnnn other models. Conference’17, July 2017, Washington, DC, USA Levental and Orlova Then there is mathematical abstraction; Russell defines abstraction ...A garbage collector is an abstraction of a special as follows: process that collects garbage and makes it once again available to the original program, hiding from that This principle [of abstraction] asserts that, whenever program the details of how this is done. a relation, of which there are instances, has the two properties of being symmetrical and transitive, then ...This use of code libraries is an example of pro- the relation in question is not primitive, but is ana- cedural abstraction, or the ability to execute code lyzable into sameness of relation to some other term; through the calling of named procedures that accept and that this common relation is such that there is explicitly described parameters and return certain only one term at most to which a given term can be guaranteed results. It is an example of abstraction be- so related, though many terms may be so related to a cause the details of how the procedure performs its given term. [12] computation are hidden from the procedure’s caller; since the caller only makes use of the procedure for Intriguing as these notions of abstraction may be, they are distinctly its results, there is no need for it to know the internals. different from the notion of abstraction in computer science; in [13] particular with respect to mathematical abstraction (bolding ours): Taking information and concern encapsulation as our operational ...the primary product of mathematics is inference definition of abstraction, in what sense shall we measure the costsof structures, while the primary product of computer sci- the abstractions employed by DL frameworks? An immediate can- ence is interaction patterns. This is a crucial difference, didate measure of cost is the asymptotic time (or space) complexity and it shapes their use of formalism and the kind of of various operations and data structures that comprise the abstrac- abstraction used in the two disciplines. tion. We claim that, with rare exception4, asymptotic complexity is ...computer science is distinguished from mathemat- a poorly suited measure of the complexity or cost of abstractions ics in the use of a kind of abstraction that computer sci- in the sense that we here deal with. If the abstraction is truly ab- entists call information hiding. The complexity of be- stract then it bears no resemblance to the realization (recall Locke’s haviour of modern computing devices makes the task definition of abstraction) and if the abstraction is reified thenthe of programming them impossible without abstraction analysis becomes completely impractical (owing to the numerous tools that hide, but do not neglect, details that are components and operations). Even if such analysis were practicable essential in a lower-level processing context but the result would most likely be uninteresting and inconsequential inessential in a [particular] software design and for actual DL frameworks and their users. It is well known that programming context. [13] the constant factors in the complexity and particularities of hardware systems themselves more closely govern performance than This understanding of abstraction is widely agreed upon; notably the order terms. For example, Quicksort, an O=2 sorting routine, Abelson, Sussman, and Sussman in their much revered Structure outperforms even many Θ¹= log =º sorting routines because it is and Interpretation of Programs: more cache efficient [16]. We are not at that moment concerned with how the Another way to reason about the cost of abstractions is according procedure computes its result, only with the fact that to the “zero-overhead” principle as articulated by Bjarne Stroustrup: it computes the square. The details of how the square In general, C++ implementations obey the zero-overhead is computed can be suppressed, to be considered at principle: What you don’t use, you don’t pay for.

Comparing the Costs of Abstraction for DL Frameworks

SOL: Effortless Device Support for AI Frameworks Without Source Code Changes

Training Professionals and Researchers on Future Intelligent On-Board Embedded Heterogeneous Processing

Plaidml Documentation

Advancing Compiler Optimizations for General-Purpose & Domain-Specific Parallel Architectures

Machine Learning Framework for Petrochemical Process Industry

Fast Geometric Learning with Symbolic Matrices

Triton: an Intermediate Language and Compiler for Tiled Neural Network Computations

Benchmarking Contemporary Deep Learning Hardware and Frameworks: a Survey of Qualitative Metrics Wei Dai, Daniel Berleant

Benchmarking Contemporary Deep Learning Hardware And

Stripe: Tensor Compilation Via the Nested Polyhedral Model

Towards Transparent Neural Network Acceleration

Package 'Keras'