A Versatile UDP/IP Based PC ↔ FPGA Communication Platform
Total Page:16
File Type:pdf, Size:1020Kb
A Versatile UDP/IP based PC ↔ FPGA Communication Platform Nikolaos Alachiotis, Simon A. Berger, Alexandros Stamatakis The Exelixis Lab, Scientific Computing Group Heidelberg Institute for Theoretical Studies Heidelberg, Germany Emails: {Nikolaos.Alachiotis,Simon.Berger,Alexandros.Stamatakis}@h-its.org Abstract—We present a substantially improved version of Nonetheless, this novel and more flexible UDP/IP core still our popular UDP/IP core for simple and fast PC ↔ FPGA outperforms all competing UDP/IP core implementations in communication over Gigabit Ethernet. We provide a novel terms of speed and resource requirements. feature to automatically configure (previously hard-coded) internal settings on the FPGA. Thereby, we substantially reduce Since February 2010, the proof-of-concept implementa- the installation overhead when a FPGA shall communicate with tion has been downloaded over 6,000 times from OpenCores. several different PCs. The UDP/IP core is designed to occupy org (http://opencores.org/project,udp ip core) and from a minimum amount of hardware resources on the FPGA. On our software page (http://www.exelixis-lab.org/countIPv4. the PC side, this new automatic configuration protocol can be php). Note that both Xilinx [2] and Altera [3] provide used and invoked via a C software interface which provides convenient functions for setting up the connection to the FPGA PCI Express-based solutions for PC ↔ FPGA communi- device and sending/retrieving arrays of common C data types cation. However, the interest our initial core has gener- to/from the UDP/IP core on the FPGA. The initial UDP/IP core ated underlines the apparent lack of a standardized, open- version is available under the LGPL license at http://opencores. source Ethernet-based communication mechanism. Thus, in org/project,udp ip core while the improved version of the addition to the improved UDP/IP core architecture, we core, including the C software interface (also under LGPL), is available at http://opencores.org/project,pc fpga com. also present an implementation of a comprehensive open- source PC ↔ FPGA communication platform that relies Keywords-FPGA; UDP/IP; PC-FPGA communication; on Gigabit Ethernet. This platform deploys the versatile version of the UDP/IP core and implements a simplified I. INTRODUCTION communication protocol that facilitates usage and integration FPGAs are commonly deployed as accelerator devices or of the send/receive paradigm for common C data types for verifying architectures. Irrespective of their particular and arrays. Our platform hides the complexity inherent to use, a machinery for moving data to/from the device is establishing a PC ↔ FPGA connection and to synchronizing required. Recently, we presented a proof-of-concept imple- data transmissions in both directions. The versatile UDP/IP mentation of a highly efficient UDP/IP core architecture that core and the communication platform are available for is optimized for point-to-point communication [1]. We found download at http://opencores.org/project,pc fpga com. that combining the IPv4 and UDP protocols represents an The remainder of this paper is organized as follows: efficient solution for establishing a connection between a Section II provides a short overview of related work on PC and a FPGA, with respect to reconfigurable hardware UDP/IP and TCP/IP cores as well as PC ↔ FPGA com- utilization. We achieve low resource utilization because the munication platforms. In Section III, we describe the design architecture employs a simplified method for calculating the of the novel and more versatile UDP/IP core. The PC ↔ IPv4 Header Checksum based on a pre-calculated template. FPGA communication platform is introduced in Section IV. Here, we present an improved UDP/IP core implemen- Section V covers implementation and verification details as tation and a comprehensive Ethernet-based communication well as a performance evaluation of the UDP/IP core and the platform. A key property of the UDP/IP core is that all PC ↔ FPGA platform. Section V also includes application static header fields required for point-to-point communica- examples to demonstrate the functionality of the proposed tion are stored in a lookup table. In the proof-of-concept solution. We conclude in Section VI and address directions implementation, this required the user to manually initialize of future work. the lookup table before configuring the device. The improved architecture we present here entails additional control logic II. RELATED WORK that retrieves all static fields of an incoming initialization In [4], L¨ofgren et al. present three different (commer- packet from the PC. This data is then stored in the respective cially available) stand-alone UDP/IP cores and denote them lookup table. This versatility of the new UDP/IP core as minimum, medium, and advanced implementations. The comes at a cost: the additional logic occupies approximately medium and advanced IP cores offer additional protocols 56% more hardware area than the proof-of-concept core. which are not necessary for establishing a direct PC ↔ to from user interface FPGA communication. Thus, we do not compare these EMAC TRANSMITTER HLUT wraddr,wren,din heavy-weight cores with our implementation because such reg a comparison would be unfair. The minimum IP core offers wren,value the same functionality as our Gigabit-speed UDP/IP core but RST_U IPv4 Header Checksum rst HLUT requires 2.8 times more hardware resources while operating Register CONTROL at 10/100 Mbps only. CONFIG_U from In [5], K¨uhn et al. used UDP/IP to implement PC ↔ EMAC config FPGA communication over Gigabit Ethernet. To implement RECEIVER to user interface the communication mechanism, the authors deployed an operating system (Linux) that was running on an embedded Figure 1. Top-level design of the advanced UDP/IP core. PowerPC processor. While using a processor that handles the communication represents an elegant and straightforward approach, not all reconfigurable devices contain such hard- coded processors. When such a processor is not required for of the 802.3 MAC frame are the Destination Address, the conducting additional computations, deploying a soft-coded Source Address, and the Ethernet Type. The IPv4 header sec- processor can lead to excessive hardware utilization on the tion contains the following fields: Version, Header Length, device. Our novel communication platform essentially offers Differentiated Services, Total Length, Identification, Flags, the same functionality but does not require an embedded Fragment Offset, Time to Live, Protocol, Header Checksum, processor. Source Address, and Destination Address [11]. The UDP Dollas et al. [6] presented an architecture for an open header section consists of the Source Port, the Destination TCP/IP core that supports all necessary protocols (ARP, Port, the Length, and the Checksum [12]. When only a ICMP, UDP) typically required for real-world applications. direct point-to-point connection between a PC and a FPGA The architectural complexity that is induced by the large is considered, all of the above header fields are constant number of supported protocols would yield a comparison to with the exception of the IPv4 Total Length and Header our light-weight design unfair. Note that a full TCP/IP core is Checksum fields as well as the UDP Length field. The idea not necessary for direct PC ↔ FPGA communication since of storing constant fields of UDP point-to-point connections this can be implemented using less complex protocols and in lookup tables has been used before [4]. However, the a lower amount of resources. required hardware resources are further reduced in our design because the calculation of the IPv4 checksum field In [7], Lin et al. presented a PC ↔ FPGA based commu- nication platform for verification and fast prototyping. The can be implemented by a single subtraction. platform communicates via a PCI interface and is intended During a transmission of an Ethernet packet from the to facilitate verification of multimedia applications. There FPGA to the PC, all static/constant fields are retrieved exist similar platforms as the one presented by Lin et al. that directly from the HLUT. The three variable fields (Total target specific application types [8], [9]. Our PC ↔ FPGA Length, Header Checksum, Length) can be calculated at communication platform offers a mechanism for integrating low cost using two additions and one subtraction. The communication routines into any kind of application and initial architecture contained three hard-coded values that allows for transferring arrays of the basic IEEE-754 data were used as the first operand of the two adders and the types (e.g., characters, short integers, integers, floats, long subtracter, while the second operand in all three operations integers, and doubles). was the length of the user data. The hard-coded values were Recently, Lieber and Hutchings [10] presented an open- set to the minimum IPv4 Total Length and UDP Length source PC ↔ FPGA communication framework for Xilinx values, that is, the offsets to the respective header sections FPGA boards achieving a maximum throughput of 504 and the IPv4 Header Checksum field of an Ethernet packet Mbps. Our PC ↔ FPGA communication platform provides without any user data attached to it. Based on the amount similar functionality and can operate at Gigabit speed (1000 of data (number of bytes) transferred with the packet, the Mbps) while requiring 50% less reconfigurable resources. corresponding length fields are calculated by two additions. The subtraction is used to calculate the correct IPv4 Header III. VERSATILE UDP/IP CORE ARCHITECTURE Checksum as a function of the IPv4 Total Length field. A In the following, we briefly review the basic concept of more detailed description of this procedure and a proof that the initial UDP/IP core and describe the enhancements in the a single subtraction is sufficient can be found in [1]. new version. As already mentioned, all static header fields Figure 1 illustrates the versatile UDP/IP core architecture.