Debugging Support for Pattern-Matching Languages And

Debugging Support for Pattern-Matching Languages and Accelerators Matthew Casias Kevin Angstadt Tommy Tracy II University of Virginia University of Michigan University of Virginia Department of Computer Science Computer Science and Engineering Department of Computer Science Charlottesville, VA, USA Ann Arbor, MI, USA Charlottesville, VA, USA [email protected] [email protected] [email protected] Kevin Skadron Westley Weimer University of Virginia University of Michigan Department of Computer Science Computer Science and Engineering Charlottesville, VA, USA Ann Arbor, MI, USA [email protected] [email protected] Abstract accuracy by 22%, or 10 percentage points, in a statistically Programs written for hardware accelerators can often be significant mannerp ( = 0:013). difficult to debug. Without adequate tool support, program CCS Concepts • Software and its engineering → Do- maintenance tasks such as fault localization and debugging main specific languages; Software testing and debugging; • can be particularly challenging. In this work, we focus on Computer systems organization → Reconfigurable com- supporting hardware that is specialized for finite automata puting. processing, a computational paradigm that has accelerated pattern-matching applications across a diverse set of prob- Keywords automata processing, debugging, human study lem domains. While commodity hardware enables high- ACM Reference Format: throughput data analysis, direct interactive debugging (e.g., Matthew Casias, Kevin Angstadt, Tommy Tracy II, Kevin Skadron, single-stepping) is not currently supported. and Westley Weimer. 2019. Debugging Support for Pattern-Matching We propose a debugging approach for existing commodity Languages and Accelerators. In Proceedings of 2019 Architectural hardware that supports step-through debugging and variable Support for Programming Languages and Operating Systems (ASP- inspection of user-written automata processing programs. LOS’19). ACM, New York, NY, USA, 14 pages. https://doi.org/https:// We focus on programs written in RAPID, a domain-specific doi.org/10.1145/3297858.3304066 language for pattern-matching applications. We develop a prototype of our approach for both Xilinx FPGAs and Mi- 1 Introduction cron’s Automata Processor that supports simultaneous high- The amount of data being produced by companies and con- speed processing of data and interactive debugging without sumers continues to grow,1 and business leaders are becom- requiring modifications to the underlying hardware. Our em- ing increasingly interested in analyzing and using this col- pirical evaluation demonstrates low clock overheads for our lected information.2 To keep up with data processing needs, approach across thirteen applications in the ANMLZoo au- companies and researchers are turning to specialized hard- tomata processing benchmark suite on FPGAs. Additionally, ware for increased performance. Accelerators, such as GPUs, we evaluate our technique through a human study involving FPGAs, and Micron’s D480 Automata Processor (AP) [10], over 60 participants and 20 buggy segments of code. Our trade off general computing capabilities for increased per- generated debugging information increases fault localization formance on very specific workloads; however, these de- Permission to make digital or hard copies of all or part of this work for vices require additional architectural knowledge to effec- personal or classroom use is granted without fee provided that copies tively program and configure. Despite this added complex- are not made or distributed for profit or commercial advantage and that ity, researchers have successfully used specialized hardware copies bear this notice and the full citation on the first page. Copyrights to accelerate data analysis across many domains, includ- for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or ing: natural language processing [66], network security [33], republish, to post on servers or to redistribute to lists, requires prior specific graph analytics [32], high-energy physics [53], bioinformat- permission and/or a fee. Request permissions from [email protected]. ics [30, 31, 45], pseudo-random number generation and simu- ASPLOS’19, April 13–17, 2019, Providence, RI, USA lation [48], data-mining [51, 52], and machine learning [44]. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1https://web.archive.org/web/20170203000215/http://www.csc.com/insights/flxwd/ ACM ISBN 978-1-4503-6240-5/19/04...$15.00 78931-big_data_universe_beginning_to_explode https://doi.org/https://doi.org/10.1145/3297858.3304066 2https://www.dnvgl.com/assurance/viewpoint/viewpoint-surveys/big-data.html Numerous programming models have been introduced performed at the signal level using logic analyzers or scan to ease the burden on hardware accelerator users, such as chains [4, 21, 43, 55], exposing low-level state to software. OpenCL [41], Stanford’s Legion programming system [6], The AP also provides no explicit debugging support, but does and Xilinx’s SDAccel framework.3 Recently, the RAPID lan- expose low-level state through APIs. guage was proposed to improve the programming of au- We propose an approach for building an interactive, source- tomata processing engines [3]. These processors accelerate level debugger using low-level signal inspection on hardware the identification of a collection of byte sequences (orpat- accelerators. Our debugging system includes support for terns) in a stream of data by supporting many comparisons breakpoints and data inspection. We demonstrate prototype in parallel. RAPID is a C-like language that includes a com- implementations for both the AP and Xilinx FPGAs; no modi- bined imperative and declarative model for pattern-matching fications to the underlying accelerators are needed. While we problems, providing intuitive representations for patterns in focus our presentation on one indicative DSL, the techniques use cases where regular expressions become cumbersome we present for exposing state from low-level accelerators to or exhaustive enumerations. The language provides parallel provide debugging support lay out a general path for pro- control structures, admitting concurrent searches for mul- viding such capabilities for other accelerators and languages. tiple criteria against the data stream. RAPID programs are Our approach leverages four key insights: compiled into finite automata, supporting efficient execution • A combined hardware accelerator and CPU-software using both automata processing engines, such as Micron’s simulation system design allows for both high-speed Automata Processor (AP) or Subramaniyan et al.’s Cache Au- data processing as well as interactive debugging. tomaton [42], and also general-purpose accelerators, such as • Micron’s AP contains context-switching hardware re- Field-Programmable Gate Arrays (FPGAs) and GPUs. How- sources, which are often left unused, for processing ever, RAPID abstracts away from the low-level automata or multiple input streams in parallel. Additionally, FPGA circuit paradigms used by the hardware, thus allowing de- manufacturers provide logic analyzer APIs to inspect velopers to work with code in a semantically-familiar form. the values of signals during data processing. We re- This focus on new domain-specific languages (DSLs) and purpose these hardware features to transfer control accelerators introduces challenges from a software mainte- from the execution context on the accelerator to an nance standpoint. Developers may wish to port existing code interactive debugger on the host system. to these new languages or rewrite algorithms to be better- • Runtime state for automata processing applications suited for these new accelerators, tasks which can introduce is compact, consisting only of the set of active states. new faults [63, 65]. For automata processing applications, We lift this state to the semantics of the source-level these faults can be particularly difficult to localize. Devel- program through a series of mappings generated at opers may not observe abnormal behavior until processing compile time. The mapping from source-level expres- large quantities of data (i.e., testing samples may not exhibit sions to architecture-level automata states is trace- high coverage of corner cases). Extracting a smaller input able within the RAPID compiler; our approach is ap- for analysis from the large data set can be challenging or plicable to any high-level programming language for costly, since many pattern-matching algorithms perform a which such a mapping from expressions to hardware sliding-window comparison where the relevant piece of data resources may be inferred. is not known a priori. It is therefore desirable to support • Setting breakpoints on expressions in a program is not high-throughput data processing with the ability to inter- directly supported by the automata processing para- rupt accelerated program execution and transfer control to digm. Instead, we set and trigger breakpoints on input a debugging environment. data, pausing execution after processing N bytes. We Although debugging support for CPUs is mature and fully- can leverage these pauses to provide the abstraction featured (including standard tools [40], successful technol- of more traditional breakpoints set on lines of code. ogy transfer [5] and annual conferences [17]), throughput of automata processing applications on CPUs is typically orders

Load more