IEEE EMBEDDED SYSTEMS LETTERS, VOL. 4, NO. 3, SEPTEMBER 2012 69 Zebra: Building Efficient Network Message Parsers for Embedded Systems Julien Mercadal, Laurent Réveillère, Yérom-David Bromberg, Bertrand Le Gal, Tegawendé F. Bissyandé, and Jigar Solanki

Abstract—Supporting standard text-based protocols in em- the limited resources, particularly with regard to the compu- bedded systems is challenging because of the often limited tational power, memory, and energy, embedded systems often computational resources that embedded systems provide. To provide. Indeed, such FSMs may contain several hundred states overcome this issue, a promising approach is to build parsers directly in the hardware. Unfortunately, developing such parsers and several thousand complex transitions, making the size of is a daunting task for most developers as it is at the crossroads of corresponding parsers too large (several dozen kilobytes) for several areas of expertise, such as low-level network programming embedded systems. To simplify parser construction, automatic or hardware design. In this letter, we propose Zebra, a generative approaches including Gapa [1] and Zebu [2] have been pro- approach that drastically eases the development of hardware posed for generating an FSM implementation from a high-level parsers and their use in network applications. To validate our approach, we used Zebra to generate hardware parsers for widely specification of a protocol. However, to the best of our knowl- used protocols, including HTTP, SMTP, SIP, and RTSP. Our edge, existing automatic approaches do not address embedded experiments show that Zebra-based parsers are up to 11 times systems requirements and, in particular, have not explored the faster than software-based parsers. use of dedicated hardware to improve their performance. Thus, Index Terms—Circuits and systems, client-server systems, the resulting generated code is still CPU intensive. computers and information processing software, distributed Implementing an FSM using a dedicated hardware archi- computing, embedded software, field programmable gate arrays, tecture, as compared to a software-based implementation, middleware. improves performance. Indeed, a hardware parser can be de- signed specifically to execute multiple computations in parallel I. INTRODUCTION and in one clock cycle. Moreover, conditional jumps, whicharemassivelyusedinsoftwareimplementationofFSMs, are processed in one clock cycle without pipeline break penal- MBEDDED systems are increasingly required to interact ties. Finally, a hardware-based FSM requires a lower working both among themselves as well as with legacy infrastruc- E frequency to reach the same performance than its software tures to provide advanced services to end-users. This kind of counterpart, and thus consumes less energy. communication among heterogeneous entities requires a pro- Nonetheless, developing a network application that uses tocol to manage their interaction. Traditionally, because of their hardware parsers is challenging, requiring not only exper- highly constrained resources, nonstandard application-specific tise in hardware design and integration, but also substantial binary protocols where message parsing and message construc- knowledge of the protocols involved and an understanding of tion are simple have been used [18]. The use of nonstandard pro- low-level network programming. These issues are challenging tocols, however, complicates the interaction with other systems, on an individual basis and the need to address all of them at as it is required in many emerging applications. Thus, the use of once makes the development of hardware protocol message standard text-based protocols has been getting greater attention. parsers particularly difficult. For example, the SIP protocol is now being used in sensor net- In this letter, we propose a codesign based architecture and a works [8] and mobile ad hoc networks [9], [16]. generative approach for building and using hardware parsers in Standard text-based protocol message parsers are typically a network application. To this end, we present a domain-specific implemented in software as finite state machines (FSMs), language, called Zebra, for describing standard text-based pro- using a low-level language such as C to provide efficiency. tocol message formats and related processing constraints. Zebra However, developing such parsers is challenging because of is an extension of augmented Backus-Naur form (ABNF) [6], the variant of Backus-Naur form (BNF) used in request for com- Manuscript received April 20, 2012; accepted June 08, 2012. Date of publica- ments (RFCs), to specify the syntax of network protocol mes- tion July 24, 2012; date of current version September 14, 2012. This manuscript was recommended for publication by S. K. Shukla. sages, implying that the programmer can simply copy a network J. Mercadal, L. Réveillère, Y.-D. Bromberg, T. F. Bissyandé, and J. Solanki protocol message grammar from an RFC to begin developing a are with Laboratoire Bordelais de Recherche en Informatique (LaBRI), Univer- parser. It extends ABNF with annotations indicating which mes- sity of Bordeaux, Talence cedex F-33405, France (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; sage fragments should be stored in data structures, and other se- [email protected]) . mantic information. B. Le Gal is with Laboratoire de l’Integration du Materiau au Systeme AZebra specification is processed by a compiler that gener- (IMS), University of Bordeaux, Talence cedex F-33405, France (e-mail: [email protected]) . ates both the HDL source code of the hardware parser imple- Digital Object Identifier 10.1109/LES.2012.2208617 mented as an FSM and the associated C code to drive it. The

1943-0663/$31.00 © 2012 IEEE 70 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 4, NO. 3, SEPTEMBER 2012 application runs on top of middleware that hides low-level de- tails to developers and manages the generated hardware parsers. The contributions of this letter are as follows: • We designed and implemented a generative approach for building hardware parsers for embedded systems. Our ap- proach is based on a codesign architecture to provide hard- ware parsing capabilities to software applications. • We conducted a set of experiments on protocols such as HTTP, RTSP, SIP, and SMTP to assess our approach. Pre- liminary results show a speedup of message parsing from 3.9to11comparedtosoftware-basedparsers. The remainder of this letter is structured as follows. Section II presents the Zebra hardware platform designed to support the execution of message parsers, the middleware to manage the Fig. 1. Zebra approach. underlying hardware units, and the Zebra language to describe message formats and a compiler that produces the necessary HDL and C code. Section III presents the performance evalua- the Zebra language to describe message formats and the com- tion of Zebra-based parsers. Finally, Section IV reviews related piler that produces the necessary HDL and C code. research works, and Section V concludes the letter and discusses future work. A. Zebra Hardware Platform We combined the and the parsing units into II. ZEBRA APPROACH one chip (SoC) to 1) reduce power consumption; 2) simplify The most efficient way to implement an board layout; 3) preserve signal integrity; 4) avoid electromag- application is to develop a fully-customized architecture, netic interference and; and 5) allow very fast communication using programmable logic devices or even dedicated applica- links between them. We use field programmable tion-specific integrated circuits (ASICs). However, hardware (FPGA) devices for system integration since they are particu- design is a tedious and time consuming compared to larly suitable for embedded system prototyping [4]. However, traditional software development. To alleviate the burden in the proposed approach is not limited to FPGA devices and can hardware-based implementations, the codesign methodology easily be extended to ASIC targets. proposes to slice an application based on the performances it As illustrated in Fig. 1, the Zebra platform consists of a gen- requires. Parts of the application that require high performance eral-purpose microprocessor to execute the application logic, are implemented using dedicated hardware units. Less sensitive and a set of dedicated hardware units for message parsing. performance parts are implemented as software code running Our current implementation relies on a LEON3 soft CPU core, over a general-purpose microprocessor. Typically, the lowest which is an open-source implementation of the SPARC v8 part of a network application, known as the protocol-handling 32-bit architecture, allowing its instruction set to be extended. layer, consumes 25% of the total message processing time The use of such a soft CPU core, combined with the generation [5], [19]. This layer must thus be efficient and calls for a of generic HDL code, enables implementation of our system hardware-based implementation to reach the expected level of on any ASIC or FPGA target, without any changes. performance. To do so, we have developed the Zebra approach In Zebra, parsing units are implemented as , in- dedicated to building efficient network message parsers. Our terconnected with the microprocessor through a set of dedicated approach consists of a hardware platform to support parser links. In particular, a parsing unit has a specificdesignthatin- execution and a middleware to transparently integrate hard- cludes: a 32-bit data input interface for receiving data stream ware parsers into network applications. The main objective to parse from the microprocessor, a 32-bit data output inter- of Zebra is to minimize the need for developer intervention face to send back parsing results, a set of both dedicated inter- in the complex process of developing and using a hardware faces, and control signals for managing the parser. The 32-bit parser in a software application. Accordingly, Zebra provides data interfaces enable the transfer of up to 4 bytes per micro- a high-level specification language to describe text-based processor clock cycle. The instruction set of the microprocessor protocol message formats and related processing constraints. has been extended to provide commands and read/write opera- From this specification, a compiler automatically generates tions to each parsing unit. The number of parsing units that can both the HDL synthesizable specification to be plugged into be embedded depends on the size of the FPGA device and the the hardware platform and the associated C code tailored to the complexity of the protocol state machines. application needs. Fig. 1 illustrates our approach. We now describe, in more detail, the Zebra approach. First, B. Zebra Middleware we describe the hardware platform we designed to support the To process network messages from an input stream, an ap- execution of message parsers. We then present the middleware plication registers a callback function to the Zebra middleware. we developed to drive the underlying hardware-dedicated units This function describes the application logic based on values and interconnect them with the network application running on of specific message elements. The Zebra middleware manages top of a general-purpose microprocessor. Finally, we introduce registered applications by reading data on input streams as they MERCADAL et al.: ZEBRA: BUILDING EFFICIENT NETWORK MESSAGE PARSERS 71

In our experience with RFCs, the ABNF specification does not completely define the message structure. Indeed, further constraints are explained in the accompanying text. For ex- ample, the RFC of HTTP indicates that the length of the body of an HTTP message depends on the Content-Length field value. To express this constraint, the developer has only to annotate the variable-length field message-body with thenameofthefield, between curly brackets, that defines its size (i.e., clen).Such fields must be typed as integers. Finally, the Zebra compiler generates a hardware parser tai- Fig. 2. Excerpt of Zebra specification for HTTP. lored to the application needs and according to the provided annotations, and the associated C code needed to drive it. The are received and sending these data to the corresponding parsing hardware parser corresponds to an FSM whose transitions signal unit. The middleware then reads parsing results from the output the start or the end of message elements annotated in the Zebra interface of the parsing unit. When the parsing of a message el- specification. Thus, when such transitions are fired, the hard- ement is completed, the middleware executes ad hoc code to ware parser writes into its output interface the name of the mes- make the value accessible to the application. Note, the mid- sage element being parsed, the current position of the consumed dleware can perform other computations while waiting for the data, and if it is the start or the end. This information is then used parsing units to complete their work. To increase sharing of by the Zebra middleware to execute the corresponding gener- parsing units between several tasks, the middleware seamlessly ated C code, enabling it to extract and save the value of the saves and restores the parser state when required. This context parsed message element. on the hardware parsing units is very efficient and re- quires only about 9 microprocessor cycles. III. EVAL UATIO N The Zebra middleware has been implemented in C and suc- We conducted a set of experiments to assess our approach. cessfully cross-compiled for the LEON3 microprocessor. Addi- We used a Xilinx Virtex-5 board and a LEON3 microprocessor tionally, we modified the toolchain to support the extended configured at 50MHz. We wrote Zebra specifications for four instruction set that we introduced for controlling parsing units. of the most ubiquitous protocols on the Internet: HTTP, SMTP, SIP, and RTSP. For each of them, we used the Zebra compiler C. Zebra Language to automatically produce the corresponding VHDL and C code. The Zebra language is based on the ABNF notation used in The generated VHDL code was then synthesized in the FPGA RFCs to specify the syntax of protocol messages in order to ease device using the Xilinx ISE toolchain. The generated C code its adoption by network application developers. Once having was cross-compiled using a modified version of the SPARC created a basic Zebra specification, the developer can further an- toolchain and plugged into the Zebra middleware. notate it according to application-specific requirements. Fig. 2 In order to evaluate the processing time to parse an input mes- shows an excerpt of the Zebra specification for the HTTP pro- sage from either HTTP, SMTP, SIP or RTSP, we developed a tocol as defined in RFC 2616. logging application, one for each protocol, that logs messages Annotations define the message view available to the appli- received from the network. For each application, we imple- cation, by indicating the message elements that this view should mented two versions of its parsers: one fully implemented in include. These annotations drive the generation of the data struc- software and one based on Zebra. For the software-based ver- ture that contains the message elements. For example, three sion, we used the Ragel [17] tool to produce an optimized FSM message elements are annotated in Fig. 2. To make an element implementation in C. available, the programmer has only to annotate it with the “ ” We now present a microbenchmark for these applications symbol and the name of a fieldinthegenerateddatastructure using real messages. The datasets were collected over a 2-hour that should store the element’s value. For instance, in Fig. 2, period in our research laboratory. In our experiments, a client the Zebra programmer indicates that the application requires the application replays a real trace, extracting and sending each URI of the request line . Hence, the data structure repre- message of this trace. We instrumented the code of the log- senting the message will contain one string field: uri. ging applications to measure the parsing time for each received In addition to tagging message elements that will be avail- message. able to the application, annotations impose type constraints on Fig. 3 presents the results of our evaluation. We observe that these elements. This can be specified using the notation as fol- the Zebra-based parsers are between 3.9 and 11 times faster than lowed by the name of the desired type. For example, in Fig. 2, their fully software-based counterparts. the Content-Length field value is specified to repre- sent an unsigned integer of 32 bits (uint32). A type constraint IV. RELATED WORK enables representation of an element as a type other than string. Over the last decade, many approaches designed to elimi- The use of both kinds of annotations allows the generated data nate the task of having to handwrite network protocol mes- structure to be tailored to the requirements of the application sage parsers have emerged [1], [2], [10], [15]. These approaches logic. This simplifies the application logic’s access to the mes- typically propose a three-step process: 1) describing network sage elements. protocol messages in a high-level specification; 2) generating 72 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 4, NO. 3, SEPTEMBER 2012

a set of experiments on four commonly used protocols to as- sess our approach. Preliminary results, using microbenchmarks, show that Zebra-based parsers are up to 11 times faster than soft- ware-based parsers. We are currently investigating the dynamic reconfiguration capabilities of FPGA to update at runtime the protocols sup- ported by Zebra. We are also extending the Zebra middleware to provide advanced scheduling of available parsing units based on active clients to reduce misses when accessing received buffered messages stored in central memory.

REFERENCES [1]N.Borisov,D.J.Brumley,H.J.Wang, J. Dunagan, P. Joshi, and C. Guo, “A generic application-level protocol analyzer and its language,” in Proc.14thAnnu.Netw.Distrib. Syst. Secur. Symp., 2007, pp. 14:1–14:14. [2] L. Burgy, L. Réveillére, J. Lawall, and G. Muller, “Zebu: A lan- Fig. 3. Comparison of Zebra-based and soft-based parsers. guage-based approach for network protocol message processing,” IEEE Trans. Softw. Eng., vol. 37, pp. 575–591, 2011. [3]A.Canis,J.Choi,M.Aldham,V.Zhang,A.Kammoona,J.H.An- derson,S.Brown,andT.Czajkowski, “LegUp: High-level synthesis software parsers from this high-level specification; and 3) pro- for FPGA-based processor/accelerator systems,” in Proc. 19th ACM/ viding a framework to ease the development of applications on SIGDAInt..Symp.FieldProgram. Gate Arrays, New York, 2011, pp. 33–36. top of generated parsers. However, none of these approaches [4]R.CoferandB.F.Harding,Rapid System Prototyping With FPGAs: specifically targets highly constrained embedded systems. For Accelerating the Design Process, 1st ed. Amsterdam, The Nether- instance, sensor networks relying on dedicated hardware such as lands: Elsevier Science, 2005. [5] M. Cortes and J. R. Ensor, “Narnia: A virtual machine for multimedia ASIC or FPGA do not have enough energy, code, andmemory communication services,”inProc. 4th Int. Symp. Multimedia Softw. to support the aforementioned approaches. Eng., 2002, pp. 246–254. To overcome this issue, one emerging solutionistoimple- [6] Augmented BNF for Syntax Specifications: ABNF (RFC 2234),Pro- posed Standard, Nov. 1997. ment parsers directly in hardware. Hence, high-level specifica- [7] M. Fingeroff, (2010), High-Level Synthesis Blue Book, tions of network protocol messages are mapped directly into Xlibris Corporation [Online]. Available: http://books.google.fr/ hardware description languages such as VHDLtobethensuc- books?id=k_ZCyNEK2hkC [8] S. Krishnamurthy, “TinySIP: Providing seamless access to cessively synthesized into ASIC or FPGA [12]–[14]. However, sensor-based services,” in Proc. 3rd Int. Conf. Mobile Ubiqui- hardware parsers are provided as is andrequireastrongun- tous Sys.: Netw. Serv., 2006, no. 4611, pp. 1–9. derstanding of hardware design fundamentals to integrate them [9] N. Banerjee, A. Acharya,andS.K.Das,“EnablingSIP-basedsessions in ad hoc networks,” Wireless Netw., vol. 13, no. 4, pp. 461–479, Aug. with network programming applications. In contrast, the Zebra 2007. approach covers the development lifecycle of a network mes- [10] A. Madhavapeddy, “Creating High-Performance, Statically Type-Safe sage parser, from its specification to the generation of hardware Network Applications,” Ph.D. dissertation, Cambridge Univ., Cam- bridge, U.K., 2007. accelerators to its integration into network applications. [11] G. Martin and G. Smith, “High-level synthesis: Past, present, and fu- Many commercial and academic high-level synthesis (HLS) ture,” IEEE Des. Test Comput., vol. 26, no. 4, pp. 18–25, July 2009. tools have been proposed to generate hardware architectures [12] A. Mitra, M. R. Vieira, P. Bakalov, W. A. Najjar, and V. J. Tsotras, “Boosting XML filtering with a scalable FPGA-based architecture,” from algorithmic descriptions written in C, C++, or SystemC CoRR, vol. abs/0909.1781, 2009. [3]–[11]. However, these tools remain general purpose and are [13] J. Moscola, J. W. Lockwood, and Y. H. Cho, “Reconfigurable content- mostly oriented toward applications [11]. Thus, they based router using hardware-accelerated language parser,” ACM Trans. Des. Autom. Electron. Sys., vol. 13, no. 2, pp. 28:1–28:25, 2008. do not provide good results for control applications, such as [14] J. Öberg, A. Hemani, and A. Kumar, “Grammar-based hardware syn- protocol message parsers. For example, hardware parsers gen- thesis from port-size independent specifications,” IEEE Trans. Very erated using the LegUp tool [3] from software-based parsers Large Scale Integr.Sys., vol. 8, no. 2, pp. 184–194, 2000. [15] T. Stefanec and I. Skuliber, “Grammar-based SIP parser implemen- used in our evaluation are at least 4.5 times slower than their tation with performance optimizations,” in Proc. 11th Int. Conf. Zebra-based counterparts, and consume up to 50 times more Telecommun. , 2011, pp. 81–86. hardware resources. To the best of our knowledge, Zebra is the [16] P. Stuedi, M. Bihr, A. Remund, and G. Alonso, “SIPHoc: Efficient SIP middleware for ad hoc networks,” in Proc. 8th ACM/IFIP/USENIX Int. only solution that bridges the gap between HDL designs and Conf. Middleware, 2007. system software engineering in the context of control applica- [17] A. D. Thurston, “Parsing computer languages with an automaton com- tions for embedded systems. piled from a single regular expression,” in Proc. 11th Int. Conf. Imple- ment. Appl. Automata, Berlin, Heidelberg, 2006, pp. 285–286. [18] B. Upender and P. Koopman, “Communications protocols for em- bedded systems,” ACM Trans. Program. Lang. Syst.,vol.11,no.7, V. C ONCLUSION AND FUTURE WORK pp. 46–58, 1994. [19] S. Wanke, M. Scharf, S. Kiesel, and S. Wahl, “Measurement of the SIP In this letter, we presented Zebra, a generative approach for parsing performance in the SIP express router,” in Proc. Dependable building hardware parsers for embedded systems. We conducted Adaptable Netw. Serv., 2007, no. 4606, pp. 103–110.