Accelerating Legacy String Kernels Via Bounded Automata Learning

Total Page:16

File Type:pdf, Size:1020Kb

Accelerating Legacy String Kernels Via Bounded Automata Learning Accelerating Legacy String Kernels via Bounded Automata Learning Kevin Angstadt Jean-Baptiste Jeannin Westley Weimer University of Michigan University of Michigan University of Michigan Computer Science and Engineering Aerospace Engineering Computer Science and Engineering Ann Arbor, MI, USA Ann Arbor, MI, USA Ann Arbor, MI, USA [email protected] [email protected] [email protected] Abstract CCS Concepts. • Computer systems organization → Re- The adoption of hardware accelerators, such as FPGAs, into configurable computing; • Theory of computation → Reg- general-purpose computation pipelines continues to rise, ular languages; • Software and its engineering → Soft- but programming models for these devices lag far behind ware design techniques. their CPU counterparts. Legacy programs must often be Keywords. automata learning, automata processing, legacy rewritten at very low levels of abstraction, requiring inti- programs mate knowledge of the target accelerator architecture. While techniques such as high-level synthesis can help port some ACM Reference Format: Kevin Angstadt, Jean-Baptiste Jeannin, and Westley Weimer. 2020. legacy software, many programs perform poorly without Accelerating Legacy String Kernels via Bounded Automata Learn- manual, architecture-specific optimization. ing. In Proceedings of the Twenty-Fifth International Conference on We propose an approach that combines dynamic and static Architectural Support for Programming Languages and Operating analyses to learn a model of functional behavior for off-the- Systems (ASPLOS ’20), March 16–20, 2020, Lausanne, Switzerland. shelf legacy code and synthesize a hardware description from ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3373376. this model. We develop a framework that transforms Boolean 3378503 string kernels into hardware descriptions using techniques from both learning theory and software verification. These include Angluin-style state machine learning algorithms, 1 Introduction bounded software model checking with incremental loop The confluence of several factors, including the significant unrolling, and string decision procedures. Our prototype increase in data collection [34], demands for real-time anal- implementation can correctly learn functionality for ker- yses by business leaders [28], and the lessening impact of nels that recognize regular languages and provides a near both Dennard Scaling and Moore’s Law [73], have led to the approximation otherwise. We evaluate our prototype tool increased use of hardware accelerators. Such devices range on a benchmark suite of real-world, legacy string functions from Field-Programmable Gate Arrays (FPGAs) and Graph- mined from GitHub repositories and demonstrate that we ics Processing Units (GPUs) to more esoteric accelerators, are able to learn fully-equivalent hardware designs in 72% of such as Google’s Tensor Processing Unit or Micron’s D480 cases and close approximations in another 11%. Finally, we Automata Processor. These accelerators trade off general identify and discuss challenges and opportunities for more computing capability for increased performance on specific general adoption of our proposed framework to a wider class workloads; however, successful adoption requires additional of function types. architectural knowledge for effective programming and con- figuration. Support for acceleration of legacy programs that limits the need for architectural knowledge or manual opti- mization would only aid the adoption of these devices. Permission to make digital or hard copies of all or part of this work for While present in industry for prototyping and application- personal or classroom use is granted without fee provided that copies specific deployments for quite some time, reconfigurable are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights architectures, such as FPGAs, are now becoming common- for components of this work owned by others than the author(s) must place in everyday computing. In fact, FPGAs are in use in be honored. Abstracting with credit is permitted. To copy otherwise, or Microsoft datacenters and are also widely available through republish, to post on servers or to redistribute to lists, requires prior specific Amazon’s cloud infrastructure [2, 25, 49]. permission and/or a fee. Request permissions from [email protected]. Adopting hardware accelerators into existing application ASPLOS ’20, March 16–20, 2020, Lausanne, Switzerland workflows requires porting code to these new programming © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. models. Unfortunately, porting legacy code remains difficult. ACM ISBN 978-1-4503-7102-5/20/03...$15.00 The primary programming model for FPGAs remains Hard- https://doi.org/10.1145/3373376.3378503 ware Description Languages (HDLs) such as Verilog and ASPLOS ’20, March 16–20, 2020, Lausanne, Switzerland Kevin Angstadt, Jean-Baptiste Jeannin, and Westley Weimer VHDL. HDLs have a level of abstraction akin to assembly- as finite automata [33, 88, 96]. We demonstrate how soft- level development on traditional CPU architectures. While ware model checking, using a novel combination of bounded these hardware solutions provide high throughputs, pro- model checking with incremental loop unrolling augmented gramming can be challenging—programs written for these with string decision procedures, can answer oracle queries accelerators are tedious to develop and difficult to write cor- required by Angluin-style learning algorithms, resulting in rectly. a framework to iteratively infer automata corresponding to Higher levels of abstraction for programming FPGAs have legacy kernels. been achieved with high-level synthesis (HLS) [61], lan- We evaluate our prototype implementation of Automata- guages such as OpenCL [79], and frameworks such as Xil- Synth on a benchmark suite of string kernels mined from inx’s SDAccel [95]. While HLS may allow existing code to public repositories on GitHub. Our evaluation demonstrates compile for FPGAs, it still requires low-level knowledge that our approach is viable for small functions and exposes of the underlying architecture to allow for efficient imple- new opportunities for improving current-generation tools. mentation and execution of applications [85, 99]. In fact, We identify four key challenges associated with using state- optimization techniques for OpenCL programs have been of-the-art methods to compile legacy kernels to FPGAs and demonstrated to have detrimental impacts on performance suggest paths forward for addressing current limitations. when software is ported to a new architecture [10]. There- In summary, this paper makes the following contributions: fore, there is a need for a new programming model that sup- • AutomataSynth, a framework for accelerating legacy ports porting of legacy code, admits performant execution string kernels by learning equivalent state machines. on hardware accelerators, and does not rely on developer We extend an Angluin-style learning algorithm to use architectural knowledge. We focus our efforts on automata a combination of iterative bounded software model computations, a class of kernels with applications ranging checking and string decision procedures to answer from high-energy physics [93] to network security [71] and oracle queries. bioinformatics [68, 69, 83], for which hardware acceleration • A proof that AutomataSynth terminates and is cor- is both strongly desired and also capable of providing signif- rect (i.e., relatively complete with respect to the un- icant performance gains [8, 33, 35, 36, 39, 80, 88, 91, 96]. 1 derlying model checker) for kernels that recognize In this paper, we present AutomataSynth, a new ap- regular languages. The proof leverages the minimality proach for executing code (including legacy programs and au- of machines learned by L* and the Pumping Lemma tomata computations) on FPGAs and other hardware acceler- for regular languages. ators. Unlike HLS, which statically analyzes a program to pro- • An empirical evaluation of AutomataSynth on 18 duce a hardware design, AutomataSynth both dynamically indicative kernels mined from public GitHub reposito- observes and statically analyzes program behavior to synthe- ries. We learn 13 exactly-equivalent models and 2 near size a functionally-equivalent hardware design. Our approach approximations. is based on several key insights. First, state machines provide an abstraction that has successfully accelerated applications across many domains [68–71, 82, 83, 90, 92, 93, 98] and admit 2 Background and Related Work efficient implementations in hardware [33, 88, 96], but typi- AutomataSynth employs a novel combination of tools and cally require rewriting applications. Second, there is an entire techniques from multiple computing disciplines. To the best body of work on query-based learning of state machines (e.g., of our knowledge, this is the first work to combine model see Angluin for a classic survey of computational learning learning algorithms, software verification, and architectures theory [7]), but these algorithms commonly rely on unreal- for high-performance automata processing. We position our istic oracle assumptions. Third, we observe that the combi- work in the context of related efforts from each area and nation of software model checking (e.g., [15, 16]) and recent provide background helpful for understanding our approach. advances in string decision procedures
Recommended publications
  • Computational Learning Theory: New Models and Algorithms
    Computational Learning Theory: New Models and Algorithms by Robert Hal Sloan S.M. EECS, Massachusetts Institute of Technology (1986) B.S. Mathematics, Yale University (1983) Submitted to the Department- of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 1989 @ Robert Hal Sloan, 1989. All rights reserved The author hereby grants to MIT permission to reproduce and to distribute copies of this thesis document in whole or in part. Signature of Author Department of Electrical Engineering and Computer Science May 23, 1989 Certified by Ronald L. Rivest Professor of Computer Science Thesis Supervisor Accepted by Arthur C. Smith Chairman, Departmental Committee on Graduate Students Abstract In the past several years, there has been a surge of interest in computational learning theory-the formal (as opposed to empirical) study of learning algorithms. One major cause for this interest was the model of probably approximately correct learning, or pac learning, introduced by Valiant in 1984. This thesis begins by presenting a new learning algorithm for a particular problem within that model: learning submodules of the free Z-module Zk. We prove that this algorithm achieves probable approximate correctness, and indeed, that it is within a log log factor of optimal in a related, but more stringent model of learning, on-line mistake bounded learning. We then proceed to examine the influence of noisy data on pac learning algorithms in general. Previously it has been shown that it is possible to tolerate large amounts of random classification noise, but only a very small amount of a very malicious sort of noise.
    [Show full text]
  • Inductive Inference, Dfas, and Computational Complexity
    Inductive Inference, DFAs, and Computational Complexity Leonard Pitt* Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois Abstract This paper surveys recent results concerning the inference of deterministic finite automata (DFAs). The results discussed determine the extent to which DFAs can be feasibly inferred, and highlight a number of interesting approaches in computational learning theory. 1 Introduction Beginning with Gold's seminal paper on the identification of formal languages from examples [26], there has been extensive research into the problem of inferring DFAs (deterministic finite automata) from examples supplied to an inference algorithm. The problem of identifying an unknown DFA from examples has been central to the study of inductive inference for several reasons: Since a main focus of research in inductive inference involves determining those types of rules that can be inferred from examples, much attention has been given to those rules computable by the simplest types of formal devices. Similarly, the associated class of languages that are accepted by DFAs (i.e., the class of regular languages) occupies the "ground floor" of the Chomsky hierarchy, and research into the inferability of regular languages provides an excellent entry point into the investigation of the inferability of formal languages in general. ~'~rther, approaches and solutions to subproblems and problems related to DFA inference have often produced techniques applicable to other inference domains. In short, the study of the inferability of DFAs is an excellent means for studying a number of general aspects of inductive inference. This paper surveys a number of very recent results that (for the most part) determine the extent to which DFAs can be inferred in a computationaily efficient manner.
    [Show full text]
  • Manuel Blum's CV
    Manuel Blum [email protected] MANUEL BLUM Bruce Nelson Professor of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Telephone: Office: (412) 268-3742, Fax: (412) 268-5576 Home: (412) 687-8730, Mobile: (412) 596-4063 Email: [email protected] Personal Born: 26 April1938 in Caracas, Venezuela. Citizenship: Venezuela and USA. Naturalized Citizen of USA, January 2000. Wife: Lenore Blum, Distinguished Career Professor of Computer Science, CMU Son: Avrim Blum, Professor of Computer Science, CMU. Interests: Computational Complexity; Automata Theory; Algorithms; Inductive Inference: Cryptography; Program Result-Checking; Human Interactive Proofs. Employment History Research Assistant and Research Associate for Dr. Warren S. McCulloch, Research Laboratory of Electronics, MIT, 1960-1965. Assistant Professor, Department of Mathematics, MIT, 1966-68. Visiting Assistant Professor, Associate Professor, Professor, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, 1968-2001. Associate Chair for Computer Science, U.C. Berkeley, 1977-1980. Arthur J. Chick Professor of Computer Science, U.C. Berkeley, 1995-2001. Group in Logic and Methodology of Science, U.C. Berkeley, 1974-present. Visiting Professor of Computer Science. City University of Hong Kong, 1997-1999. Bruce Nelson Professor of Computer Science, Carnegie Mellon University, 2001-present. Education B.S., Electrical Engineering, MIT, 1959; M.S., Electrical Engineering, MIT. 1961. Ph.D., Mathematics, MIT, 1964, Professor Marvin Minsky, supervisor. Manuel Blum [email protected] Honors MIT Class VIB (Honor Sequence in EE), 1958-61 Sloan Foundation Fellowship, 1972-73. U.C. Berkeley Distinguished Teaching Award, 1977. Fellow of the IEEE "for fundamental contributions to the abstract theory of computational complexity,” 1982.
    [Show full text]
  • A Complete Bibliography of Publications in ACM Computing Surveys
    A Complete Bibliography of Publications in ACM Computing Surveys Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 18 September 2021 Version 1.162 Title word cross-reference 2 [1274]. 3 [994, 1172, 1247, 1371, 1417, 1519, 1529, 1530, 1715, 2084]. 360◦ [1995]. 5 [2144]. k [1283, 1946, 2204, 2315]. Lp [1940]. *droid [1674]. -Enabled [1852]. -long [2204]. -Nearest [2315]. //www.acm.org [1008]. 1998 [1011]. 1998-30-3es [1008]. 360 [15]. 370 [160]. 3es/= [1008]. 4.2BSD [370]. 4.3BSD [370]. 1 2 50th [648, 745]. 50th-Anniversary [648]. 68 [102]. 7 [1222]. 80/pubs/citations/journals/surveys/1998 [1008]. 80/pubs/citations/journals/surveys/1998-30-3es/= [1008]. = [1008]. Abduction [606]. Abstract [727, 729, 929, 1016, 1080, 1918]. Abstraction [372, 1330, 1745]. Abstractions [917, 1026, 1649, 2269]. Abuse [1687]. Academic [2036]. accelerated [2181]. Acceleration [1753, 2057, 2125]. Accelerators [1924]. Access [133, 229, 246, 317, 334, 354, 638, 643, 672, 703, 869, 895, 1006, 1050, 1238, 1282, 1333, 1390, 1684, 1799, 2135, 2248, 2325]. Accountability [1544]. Accounting [1544]. achievements [1018]. achieving [1313]. ACM [571, 648, 745]. Acoustic [1723, 2187]. Acquisition [1930, 1995]. Across [1059, 1685, 2111]. Actions [1800]. Active [1046, 1771, 2080]. activities [1062]. Activity [1337, 1471, 1589, 1864, 2264]. Acyclicity [1673]. Ad [1242, 1261, 1505, 1563, 1904]. Adaptation [928, 973, 1419].
    [Show full text]