
WHITE PAPER FPGA Inline Acceleration FPGA Inline Acceleration for Streaming Analytics Authors Abstract Meha Kainth This paper presents a method to implement FPGA inline acceleration for streaming Intel® Corporation analytics. The accelerator that is implemented on the FPGA fabric using the Programmable Solutions Group OpenCL™ approach with streaming pipes, processes data packets directly from the San Jose, USA network via a 10 Gbps Ethernet (10GbE) interface and applies inline processing [email protected] on the streaming data to monitor the signature waveforms in real time. The FPGA inline accelerator that acts as a co-processor to the CPU in streaming analytics platforms provides a scalable solution that can handle data as it grows in volume, Dan Pritsker velocity, and complexity. Intel Corporation Programmable Solutions Group The FPGA inline accelerator enables a hardware parallel platform that can handle San Diego, USA analytics workloads of real-time data very efficiently. The capability to ingest data [email protected] while performing inline processing for data conditioning can provide real-time insights from the streaming data. This approach offers a solution to Hong Shan Neoh extract information that resides in the data stream to generate rapid real-time Intel Corporation decisions before the data becomes stale. In addition, the FPGA inline accelerator Programmable Solutions Group has multiple key advantages including low-latency interface, data locality, and San Jose, USA modularity in input and deterministic response regardless of data rate or data [email protected] format. This paper provides the benchmark results for comparing the performance of inline acceleration that is implemented in an FPGA versus a system without a hardware accelerator in terms of latency and sustainable data rate. Results from the benchmark show that the FPGA inline accelerator has 22X lower end-to-end latency while maintaining data rate of 9 Gbps without dropped packets.† Introduction Streaming analytics is the ability to continuously analyze image processing and real-time data to extract meaningful information from it on the fly. The analysis Table of Contents can be in the form of mathematical calculations, statistical packet inspection [1]. Abstract . 1 Streaming analytics connects to various external data sources and sends certain Introduction . 1 data to downstream applications. This enables applications to integrate certain Streaming Analytics Using FPGAs . 2 data into the application flow, or to update an external database with processed High-Level Design Using OpenCLTM . 2 information [2]. The data can originate from the Internet of Things (IoT), mobile Host Pipes . 3 phones, and mobile devices, such as iPads, market data, social media, sensors, Related Work . 3 Web clickstream, and financial transactions [3]. Streaming analytics enables Inline FPGA Acceleration . 4 analysis of data as soon as it becomes available allowing the ability to analyze risks before they occur. Hardware Implementation and Methodology . 4 Traffic Generator . 4 FPGA Accelerator . 5 Implementation Results . 5 Latency . 5 Sustainable Data Rate . 6 Summary . 6 References . 7 White Paper | FPGA Inline Acceleration for Streaming Analytics The data in streaming analytics environment is processed Streaming analytics require high-performance and low- before being stored in a storage database as opposed latency processing of data streams. General purpose CPU to traditional data analytics technologies that use batch may not be sufficient for real-time analytics. Intel FPGAs processing techniques by storing data for a certain period accelerate and aid in compute and connectivity required to before doing the analysis. This technology supports collect and process the massive quantities of information by much faster decision making than possible. Furthermore, controlling the datapath. In addition to FPGAs being used as conventional approaches to streaming analytics involve an accelerator, they can also directly receive data and process downstream software tools running on CPU, that inspect it inline before going through the CPU host system. This frees and analyze data before forwarding it further downstream the processor to manage other system events and provide to systems and applications for consumption by end users. higher real-time system performance [14]. FPGAs’ flexibility However, the ability for these tools to perform real-time enables them to deliver deterministic low latency and high analysis and generate alerts is limited by the performance of bandwidth. Some of the other factors for suitability of FPGAs today’s solutions used to Extract, Transform, and Load (ETL) for streaming analytics are: data into downstream systems due to the latency they add • Lower latency: FPGAs can be connected closer to the between data collection and data analysis [4]. streaming media, which means complete data processed Mitigating these issues requires placing the beginning of the with no extra transfer and load, eliminating the need for analytic pipeline as close to the point of ingress as possible flow control and leveraging hardware acceleration for initial data analysis • Data locality: The processing element is close to the [4]. In this paper, we focus on our efforts to design and abundant internal memory bandwidth of an FPGA implement an FPGA-based hardware accelerator at the point of ingress such that it can process data at inline rate. The • Energy efficiency FPGA-based accelerator uses an Intel Arria® 10 FPGA and • Flexibility in handling various data rates and granularities the Intel FPGA SDK for OpenCL with a Host Pipes application due to built-in I/O interface and protocols programming interface (API) to stream data into a kernel directly from a streaming I/O interface such as 10GbE. Our Intel FPGAs provide state-of-the-art solutions to enable application applies transformation to the streaming data to designers to use FPGAs for hardware acceleration of monitor the signature waveforms in real time. streaming applications. The core components to enable this are a streaming interface and an OpenCL tool flow for FPGA. The paper first discusses the advantages of using FPGAs as a hardware accelerator for streaming analytics applications. Intel FPGAs support the OpenCL framework, which is an High-Level Design Using OpenCL effective tool flow for heterogeneous computing. This FPGAs were traditionally programmed using hardware framework supports OpenCL pipe semantics and 'Host description languages (HDL) that are synthesizable, such Pipes' which is an efficient streaming interface integrated as Verilog and VHDL. These languages include complex in the Intel FPGA OpenCL flow for low-latency streaming constructs for describing parallel simulations and timing applications. Section III presents the related work. Section IV delays that requires specialized skills. Recent improvements discusses the key advantages and data flow for using FPGA in tool flows have enabled development of OpenCL for inline acceleration. We discuss our design implementation heterogeneous parallel programing framework for Intel and methodology in Section V. Section VI provides results FPGAs. The OpenCL standard naturally matches the highly for improvements in latency and sustainable data rates with parallel nature of FPGAs [15]. OpenCL allows the programmer FPGA as compared to a setup without a hardware accelerator. to explicitly specify and control the thread-level parallelism, Section VII summarizes the paper. while developing in C-like programming language. This enables good match for FPGA development as it offers very Streaming Analytics Using FPGAs high level of parallelism. FPGAs have high throughput and higher performance per Unlike CPUs and graphics processing units (GPUs), where watt efficiency as compared to general-purpose processors. parallel threads can be executed on different cores, Due to this, FPGAs have become an attractive and effective FPGAs offer a different approach. Kernel functions can be means of accelerating high-performance computing and transformed into dedicated and deeply pipelined hardware data-centric applications as well as handling streaming circuits that are inherently multithreaded using the concept analytics applications [5-8]. Some of these applications of pipeline parallelism. Each of these pipelines can be include information filtering [5] and social media, packet replicated many times to provide even more parallelism than processing in network routers and firewalls [6], network is possible with a single pipeline. The Intel FPGA SDK for intrusion prevention [9] and threat detection systems, video OpenCL compiler translates an OpenCL kernel to hardware compression [10], low-latency market data feed arbitration by creating a circuit that implements each operation. These for financial trading [11], photonic device simulation for circuits are wired together to mimic the flow of data in the scientific computation [12], and database analytics [13]. kernel. A CPU host program has access to standard OpenCL APIs that allow data to be transferred to the FPGA, invoking the kernel on the FPGA and returning the resulting data. This function to be accelerated is referred to as an OpenCL kernel. The FPGA’s reconfigurability allows loading and unloading of different dedicated acceleration kernels that were designed for a particular type of workload [15]. 2 White Paper | FPGA Inline Acceleration for Streaming Analytics Host Pipes Related Work The OpenCL programming
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-