Accelerating Legacy String Kernels Via Bounded Automata Learning

Accelerating Legacy String Kernels via Bounded Automata Learning Kevin Angstadt Jean-Baptiste Jeannin Westley Weimer University of Michigan University of Michigan University of Michigan Computer Science and Engineering Aerospace Engineering Computer Science and Engineering Ann Arbor, MI, USA Ann Arbor, MI, USA Ann Arbor, MI, USA [email protected] [email protected] [email protected] Abstract CCS Concepts. • Computer systems organization → Re- The adoption of hardware accelerators, such as FPGAs, into configurable computing; • Theory of computation → Reg- general-purpose computation pipelines continues to rise, ular languages; • Software and its engineering → Soft- but programming models for these devices lag far behind ware design techniques. their CPU counterparts. Legacy programs must often be Keywords. automata learning, automata processing, legacy rewritten at very low levels of abstraction, requiring inti- programs mate knowledge of the target accelerator architecture. While techniques such as high-level synthesis can help port some ACM Reference Format: Kevin Angstadt, Jean-Baptiste Jeannin, and Westley Weimer. 2020. legacy software, many programs perform poorly without Accelerating Legacy String Kernels via Bounded Automata Learn- manual, architecture-specific optimization. ing. In Proceedings of the Twenty-Fifth International Conference on We propose an approach that combines dynamic and static Architectural Support for Programming Languages and Operating analyses to learn a model of functional behavior for off-the- Systems (ASPLOS ’20), March 16–20, 2020, Lausanne, Switzerland. shelf legacy code and synthesize a hardware description from ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3373376. this model. We develop a framework that transforms Boolean 3378503 string kernels into hardware descriptions using techniques from both learning theory and software verification. These include Angluin-style state machine learning algorithms, 1 Introduction bounded software model checking with incremental loop The confluence of several factors, including the significant unrolling, and string decision procedures. Our prototype increase in data collection [34], demands for real-time anal- implementation can correctly learn functionality for ker- yses by business leaders [28], and the lessening impact of nels that recognize regular languages and provides a near both Dennard Scaling and Moore’s Law [73], have led to the approximation otherwise. We evaluate our prototype tool increased use of hardware accelerators. Such devices range on a benchmark suite of real-world, legacy string functions from Field-Programmable Gate Arrays (FPGAs) and Graph- mined from GitHub repositories and demonstrate that we ics Processing Units (GPUs) to more esoteric accelerators, are able to learn fully-equivalent hardware designs in 72% of such as Google’s Tensor Processing Unit or Micron’s D480 cases and close approximations in another 11%. Finally, we Automata Processor. These accelerators trade off general identify and discuss challenges and opportunities for more computing capability for increased performance on specific general adoption of our proposed framework to a wider class workloads; however, successful adoption requires additional of function types. architectural knowledge for effective programming and con- figuration. Support for acceleration of legacy programs that limits the need for architectural knowledge or manual optimization would only aid the adoption of these devices. Permission to make digital or hard copies of all or part of this work for While present in industry for prototyping and application- personal or classroom use is granted without fee provided that copies specific deployments for quite some time, reconfigurable are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights architectures, such as FPGAs, are now becoming common- for components of this work owned by others than the author(s) must place in everyday computing. In fact, FPGAs are in use in be honored. Abstracting with credit is permitted. To copy otherwise, or Microsoft datacenters and are also widely available through republish, to post on servers or to redistribute to lists, requires prior specific Amazon’s cloud infrastructure [2, 25, 49]. permission and/or a fee. Request permissions from [email protected]. Adopting hardware accelerators into existing application ASPLOS ’20, March 16–20, 2020, Lausanne, Switzerland workflows requires porting code to these new programming © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. models. Unfortunately, porting legacy code remains difficult. ACM ISBN 978-1-4503-7102-5/20/03...$15.00 The primary programming model for FPGAs remains Hard- https://doi.org/10.1145/3373376.3378503 ware Description Languages (HDLs) such as Verilog and ASPLOS ’20, March 16–20, 2020, Lausanne, Switzerland Kevin Angstadt, Jean-Baptiste Jeannin, and Westley Weimer VHDL. HDLs have a level of abstraction akin to assembly- as finite automata [33, 88, 96]. We demonstrate how soft- level development on traditional CPU architectures. While ware model checking, using a novel combination of bounded these hardware solutions provide high throughputs, pro- model checking with incremental loop unrolling augmented gramming can be challenging—programs written for these with string decision procedures, can answer oracle queries accelerators are tedious to develop and difficult to write cor- required by Angluin-style learning algorithms, resulting in rectly. a framework to iteratively infer automata corresponding to Higher levels of abstraction for programming FPGAs have legacy kernels. been achieved with high-level synthesis (HLS) [61], lan- We evaluate our prototype implementation of Automata- guages such as OpenCL [79], and frameworks such as Xil- Synth on a benchmark suite of string kernels mined from inx’s SDAccel [95]. While HLS may allow existing code to public repositories on GitHub. Our evaluation demonstrates compile for FPGAs, it still requires low-level knowledge that our approach is viable for small functions and exposes of the underlying architecture to allow for efficient imple- new opportunities for improving current-generation tools. mentation and execution of applications [85, 99]. In fact, We identify four key challenges associated with using state- optimization techniques for OpenCL programs have been of-the-art methods to compile legacy kernels to FPGAs and demonstrated to have detrimental impacts on performance suggest paths forward for addressing current limitations. when software is ported to a new architecture [10]. There- In summary, this paper makes the following contributions: fore, there is a need for a new programming model that sup- • AutomataSynth, a framework for accelerating legacy ports porting of legacy code, admits performant execution string kernels by learning equivalent state machines. on hardware accelerators, and does not rely on developer We extend an Angluin-style learning algorithm to use architectural knowledge. We focus our efforts on automata a combination of iterative bounded software model computations, a class of kernels with applications ranging checking and string decision procedures to answer from high-energy physics [93] to network security [71] and oracle queries. bioinformatics [68, 69, 83], for which hardware acceleration • A proof that AutomataSynth terminates and is cor- is both strongly desired and also capable of providing signif- rect (i.e., relatively complete with respect to the un- icant performance gains [8, 33, 35, 36, 39, 80, 88, 91, 96]. 1 derlying model checker) for kernels that recognize In this paper, we present AutomataSynth, a new ap- regular languages. The proof leverages the minimality proach for executing code (including legacy programs and au- of machines learned by L* and the Pumping Lemma tomata computations) on FPGAs and other hardware acceler- for regular languages. ators. Unlike HLS, which statically analyzes a program to pro- • An empirical evaluation of AutomataSynth on 18 duce a hardware design, AutomataSynth both dynamically indicative kernels mined from public GitHub reposito- observes and statically analyzes program behavior to synthe- ries. We learn 13 exactly-equivalent models and 2 near size a functionally-equivalent hardware design. Our approach approximations. is based on several key insights. First, state machines provide an abstraction that has successfully accelerated applications across many domains [68–71, 82, 83, 90, 92, 93, 98] and admit 2 Background and Related Work efficient implementations in hardware [33, 88, 96], but typi- AutomataSynth employs a novel combination of tools and cally require rewriting applications. Second, there is an entire techniques from multiple computing disciplines. To the best body of work on query-based learning of state machines (e.g., of our knowledge, this is the first work to combine model see Angluin for a classic survey of computational learning learning algorithms, software verification, and architectures theory [7]), but these algorithms commonly rely on unreal- for high-performance automata processing. We position our istic oracle assumptions. Third, we observe that the combi- work in the context of related efforts from each area and nation of software model checking (e.g., [15, 16]) and recent provide background helpful for understanding our approach. advances in string decision procedures

Accelerating Legacy String Kernels Via Bounded Automata Learning

Computational Learning Theory: New Models and Algorithms

Inductive Inference, Dfas, and Computational Complexity

Manuel Blum's CV

A Complete Bibliography of Publications in ACM Computing Surveys