Messaging Framework for Hardware Accelerators

HGum: Messaging Framework for Hardware Accelerators Sizhuo Zhang Hari Angepat Derek Chiou MIT CSAIL Microsoft Microsoft [email protected] [email protected] [email protected] Abstract—Software messaging frameworks help avoid errors sender composes a data structure that represents the message, and reduce engineering efforts in building distributed systems and then calls the serialization (SER) function to encode the by (1) providing an interface definition language (IDL) to specify data structure into binary data, which is sent to the receiver via precisely the structure of the message (i.e., the message schema), and (2) automatically generating the serialization and deserial- network. The receiver uses the deserialization (DES) function ization functions that transform user data structures into binary to decode the binary data and re-construct the data structure. data for sending across the network and vice versa. Similarly, a Sender Message Schema Receiver hardware-accelerated system, which consists of host software and in IDL User Program User Program multiple FPGAs, could also benefit from a messaging framework Data Structure Code Data Structure to handle messages both between software and FPGA and also Serialization (SER) Generator Deserialization (DES) between different FPGAs. The key challenge for a hardware Network Interface Network Interface messaging framework is that it must be able to support large …… …… messages with complex schema while meeting critical constraints Physical IO Physical IO such as clock frequency, area, and throughput. In this paper, we present HGum, a messaging framework Fig. 1. Messaging framework architecture for hardware accelerators that meets all the above require- The network interface in the messaging framework abstracts ments. HGum is able to generate high-performance and low- away the details of the network and permits the use of cost hardware logic by employing a novel design that algo- different network protocols. More importantly, the framework rithmically parses the message schema to perform serialization and deserialization. Our evaluation of HGum shows that it not automatically generates the data structures and SER/DES func- only significantly reduces engineering efforts but also generates tions based on the message schema, which is unambiguously hardware with comparable quality to manual implementation. specified by the user in an interface definition language (IDL). These features of the messaging framework address all the I. INTRODUCTION previously mentioned problems. Messaging frameworks are very useful in building distributed Recently, FPGAs have been used in accelerating many software systems. In such systems, processes on different services. For example, Microsoft has used an 8-FPGA pipeline servers communicate by sending messages over the network. to accelerate Bing search [6]. In this system, messages are Although libraries exist to handle networking (e.g., Ethernet), exchanged not only between a software host and an FPGA, there is still a need to encode a message into the binary but also between FPGAs. The messages can be large and format (e.g., a blob of bytes) that is sent to the network, and complex data structures. For instance, a request message from decode the received binary data (from network) to re-construct the software host to the FPGA pipeline can be 64-KB large, the message. Since different processes may be developed by and consist of multiple levels of nested arrays. To reduce the different teams, the schema (i.e., the structure) of the message complexity and errors introduced by hand coding, we propose HGum, a messaging framework for hardware accelerators. arXiv:1801.06541v1 [cs.DC] 19 Jan 2018 and the encoded binary format must be documented so that all teams can generate a consistent encoding and decoding of Figure 2 shows the architecture of HGum, which is similar the message. Such documents are generally written in English to that of a software messaging framework. The difference rather than as a formal specification. is that HGum allows hardware to exchange messages with Hand implementations of encode/decode functions are both software and other hardware. HGum faces the following tedious and error-prone. Whenever the message schema is challenges not present for software messaging frameworks: updated, the functions must be re-implemented. Besides, 1) Limited hardware resources: Since the message can be very documents of the schema can be interpreted differently by two large, it cannot be buffered on chip entirely, nor can the port teams which may use different programming languages. Such be the width of the entire message. Therefore the SER/DES inconsistency may cause the data structure (in each language) logic must process data in a streaming fashion. that represents the message to not match the defined schema. 2) Performance and area constraints: The SER/DES logic Various messaging frameworks [1]–[5] have been developed should not be the bottleneck of the accelerator design. Thus to automatically generate codes to encode/decode messages. the generated SER/DES logic must have high throughput, These frameworks share the architecture shown in Figure 1. The run at high frequency, and occupy as little area as possible. Hardware (FPGA) Message Schema in IDL costs when transferring large messages. In contrast, HGum can User Logic transfer messages of arbitrarily large sizes. Data Structure: Token Stream Code A large number of the remaining hardware messaging Serialization/Deserialization (SER/DES) Generator Network Interface: FIFO (to stream data) frameworks only target systems with a host processor and Hardware …… or a single FPGA. Many of these frameworks [9]–[11] make Physical IO Software the communication between the host processor and FPGA Fig. 2. HGum framework architecture easier. Other instances [12]–[16] provide higher-level FPGA To address the first challenge, the network interface in HGum programming languages. All such frameworks only support is defined as a pair of FIFO interfaces to stream in and out communication between hardware and software, while HGum fixed-width data, and the “data structure” exchanged between also supports messaging from hardware to hardware. user logic and SER/DES logic is defined as a stream of tokens III. SPECIFICATION OF HGUM for incremental composition and decomposition of the message. A. Informal Example of SER/DES We refer to the fixed-width data at the network interface as phit, and its width is often set to match the bandwidth of physical Before giving the formal specifications, we use two examples links. Each token in the token stream can be viewed as a lowest- to illustrate the SER/DES functionality generated by HGum. level field of the message. It contains structural information Deserialization (DES): Figure 3 shows an example of deseri- about its location inside the message schema. Such information alizing a phit stream from the network to a stream of tokens is generated by the DES logic to enable user logic to easily that will be sent to the user logic. The message schema is make use of the tokens. The streaming network interface and the informally represented by a C++ structure. In this example, a token-stream “data structure” distinguish HGum from software phit is 32-bit wide. The first phit contains the first two message messaging frameworks, where the network interface is typically fields, a =0x1234 and b =0x5678, and the second phit contains a randomly accessible buffer and the data structure contains the last field c =0xdeadbeef. The DES logic outputs each field the whole message. as a token in the same order that the fields appear in the The second challenge is addressed by a novel design of message schema. Each token has a tag, which is informally the SER/DES logic, which parses the message schema in an shown as 0, 1 and 2 respectively in Figure 3. The tag indicates algorithmic way. This approach makes the performance and the field that the token corresponds to, and helps the user logic area of the SER/DES logic almost insensitive to the complexity consume the token. Section III-C1 will show how a user can of the message schema. Therefore, HGum provides strong specify the tag associated with each token in the HGum IDL. 0x5678 Tags guarantees on performance and area. Message schema 0xdeadbeef struct Msg { 2 1 0 To the best of our knowledge, HGum is the first messaging 0xdeadbeef uint16_t a; Network 0x5678 0x1234 framework for hardware to process large and complex messages 0x1234 uint16_t b; User interface uint32_t c; }; in a streaming fashion. logic Paper organization: Section II discusses other messaging Deserialization (DES) 32-bit phit stream Token stream frameworks for hardware. Section III specifies the interfaces and the workflow of HGum. Section IV elaborates the imple- Fig. 3. Simple example for deserialization (DES) mentation of HGum. Section V evaluates HGum for both basic Serialization (SER): Figure 4 shows an example of serializing message schema and complex schema used in real accelerators. a token stream from the user logic to a stream of phits. This is Section VI offers the conclusion. almost the reverse of the DES example in Figure 3 except that, in this case, tokens do not have tags. The SER logic does not II. RELATED WORK need tags to determine the field of each token in the schema, SER/DES functions in software messaging frameworks [1]– because the tokens are emitted by the user logic in the same [5] are store-and-forward. That is, deserialization will not start order as the fields appear in the message schema. before all the binary data is received from the network, and Message schema 0x5678 0xdeadbeef the serialized data will not be passed to the network until 0xdeadbeef 0x5678 0x1234 struct Msg { serialization is fully completed. In contrast, HGum processes User uint16_t a; Network logic uint16_t b; 0x1234 interface messages in the streaming fashion, i.e., the SER/DES logic uint32_t c; }; can function in parallel with the network transport layer.

Messaging Framework for Hardware Accelerators

Open Source Used in Influx1.8 Influx 1.9

Protolite: Highly Optimized Protocol Buffer Serializers

Tencentdb for Tcaplusdb Getting Started

Easybuild Documentation Release 20210907.0

Software License Agreement (EULA)

Bringing Probabilistic Programming to Scientific Simulators at Scale

Advances in Electron Microscopy with Deep Learning

Master Thesis

Curriculum Vitae

KNIME Deep Learning Integration Installation Guide

Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning

Datacenter Tax Cuts: Improving WSC Efficiency Through Protocol Buffer Acceleration