Paper Draft

Evaluating the Performance of Publish/Subscribe Platforms for Information Management in Distributed Real-time and Embedded Systems

Ming Xiong, Jeff Parsons, James Edmondson, Hieu Nguyen, and Douglas C Schmidt, Vanderbilt University, Nashville TN, USA Abstract1  An orchestrated information environment that aggre- gates, filters, and prioritizes the delivery of this informa- Recent trends in distributed real-time and embedded tion to work effectively under the restrictions of transient (DRE) systems motivate the need for information manage- and enduring resource constraints, ment capabilities that ensure the right information is deliv-  Continuous adaptation to changes in the operating envi- ered to the right place at the right time to satisfy quality of ronment, such as dynamic network topologies, publisher service (QoS) requirements in heterogeneous environ- and subscriber membership changes, and intermittent ments. To build and evolve large-scale and long-lived DRE connectivity, and information management systems, it is necessary to de- velop standards-based QoS-enabled publish/subscribe  Various QoS parameters and mechanisms that enable ap- (pub/sub) platforms that enable participants to commu- plications and administrators to customize the way infor- nicate by publishing the information they have and sub- mation is delivered, received, and processed in the ap- scribing to the information they need in timely manner. propriate form and level of detail to users at multiple lev- Since there is little existing evaluation of the ability of els in a DRE system. these platforms to meet the performance needs of DRE in- Conventional Service-Oriented Architecture (SOA) mid- formation management, this paper provides two contribu- dleware platforms have had limited success in providing tions: (1) it describes three common architectures for the these capabilities, due to their lack of support for data-cen- OMG Data Distribution Service (DDS), which is a stan- tric QoS mechanisms. For example, the Java Messaging dard QoS-enabled pub/sub platform and (2) it evaluates Service for Java 2 Enterprise Edition (J2EE) [10] is a SOA implementations of these architectures to investigate their middleware platform that is not well-suited for DRE envi- design tradeoffs and compare their performance with each ronments due to its limited QoS support, lack of real-time other and with other pub/sub middleware. Our results operating system integration, and high time/space over- show that DDS implementations perform significantly bet- head. Even conventional QoS-enabled SOA middleware, ter than non-DDS alternatives and are well-suited for cer- such as Real-time CORBA [9], is poorly suited for dynam- tain classes of data-critical DRE information management ic data dissemination between many publishers and sub- systems. scribers due to excessive layering, extra time/space over- head, and inflexible QoS policies. Keywords: Information Management in DRE Systems; To address these limitations—and to better support QoS-enabled Publish/Subscribe Platforms; Data Distribu- DRE information management—the Object Management tion Service; Group (OMG) has recently adopted the Data Distribution Service (DDS) [6] specification. DDS is a standard for QoS-enabled pub/sub communication aimed at mission- 1 Introduction critical DRE systems. It is designed to provide (1) location Mission-critical distributed real-time and embedded independence via anonymous pub/sub protocols that enable (DRE) applications increasingly run in systems of systems, communication between collocated or remote publishers such as the Global Information Grid (GIG) [11], that are and subscribers, (2) scalability by supporting large num- characterized by thousands of platforms, sensors, decision bers of topics, data readers, and data writers, and (3) plat- nodes, and computers connected together to exchange in- form portability and interoperability via standard inter- formation, support collaborative decision making, and ef- faces and transport protocols. Multiple implementations of fect changes in the physical environment. For example, the DDS are now available, ranging from high-end COTS GIG is designed to ensure the right information gets to the products to open-source community-supported projects. right place at the right time by satisfying end-to-end quality DDS is used in a wide range of DRE systems, including of service (QoS) requirements, such as latency, jitter, traffic monitoring [24], controlling unmanned vehicle com- throughput, dependability, and scalability. At the core of munication with their ground stations [16], and in semi- DRE systems of systems are data-centric QoS-enabled conductor fabrication devices [23]. publish/subscribe (pub/sub) platforms that provide Although DDS is designed to be scalable, efficient,  Universal access to information from a wide variety of and predictable, few publications have evaluated and com- sources running over a multitude of hardware/software pared DDS performance empirically for common DRE in- platforms and networks, formation management scenarios. Likewise, little pub- lished work has systematically compared DDS with alter- native non-DDS pub/sub middleware platforms. This paper 1 This work was sponsored in part by AFRL/IF and Vanderbilt University. addresses this gap in the R&D literature by describing the

Page 1 4/30/2018 Paper Draft results of the Pollux project, which is evaluating a range of pub/sub platforms to compare how their architecture and Figure 1: Architecture of DDS design techniques affect their performance and suitability When we create a DCPS DDS application, the follow- of DRE information management. This paper also de- ing DDS entities are involved, as shown in Figure 1: scribes the design and application of an open-source DDS benchmarking environment we developed as part of Pollux  Domain – DDS applications send and receive data with- to automate the comparison of pub/sub latency, jitter, in a domain, which provides a virtual communication en- throughput, and scalability. vironment for participants having the same domain_id. The remainder of this paper is organized as follows: This environment also isolates participants associated Section 2 summarizes the DDS specification and the archi- with different domains, i.e., only participants within the tectural differences of three popular DDS implementations; same domain can communicate, which is useful for iso- Section 3 describes our ISISLab hardware testbed and lating and optimizing communication within a com- open-source DDS Benchmark Environment (DBE); Sec- munity that shares common interests. tion 4 analyzes the results of benchmarks conducted using  Domain Participant – A domain participant is an entity DBE in ISISlab; Section 5 presents the lessons learned that represents a DDS application’s participation in a do- from our experiments; Section 6 compares our work with main. It serves as factory, container, and manager for the related research on pub/sub architectures; and Section 7 DDS entities discussed below. presents concluding remarks and outlines our future R&D  Data Writer and Publisher – Data writers are the directions. building blocks that applications use to publish data val- ues to the global data space of a domain. A publisher is 2 Overview of DDS created by a domain participant and used as a factory to create and manage a group of data writers with similar 2.1 Core Features and Benefits of DDS behavior or QoS policies.

The OMG Data Distribution Service (DDS) specification Subscriber Application Subscriber Application provides a data-centric communication standard for a range Waitset Condition Data Reader Data Reader of DRE computing environments, from small networked Condition embedded systems up to large-scale information back- Condition Listener bones. At the core of DDS is the Data-Centric Publish- wait() Subscribe (DCPS) model, whose specification defines stan- dard interfaces that enable applications running on hetero- geneous platforms to write/read data to/from a global data on_data_available() space in a DRE system. Applications share information DDS DDS with others can use this global data space to declare their intent to publish data that is categorized into one or more topics of interest to participants. Similarly, applications that want to access topics of interest can also use this data Figure 2: Listener-Based and Wait-Based Notification space to declare their intent to become subscribers. The un-  Subscriber and Data Reader – Data readers are the ba- derlying DCPS middleware propagates data samples writ- sic building blocks applications use to receive data. A ten by publishers into the global data space, where it is dis- subscriber is created by a domain participant and used as seminated to interested subscribers [6]. The DCPS model a factory to create and manage data readers. A data decouples the declaration of information access intent from reader can obtain its subscribed data via two approaches, the information access itself, thereby enabling the DDS as shown in Figure 2: (1) listener-based, which provides middleware to support and optimize QoS-enabled commu- an asynchronous mechanism to obtain data via callbacks nication. in a separate thread that does not block the main applica- tion and (2) wait-based, which provides a synchronous mechanism that blocks the application until a designated Topic Topic Topic condition is met.  Topic – A topic connects a data writer with a data read- Application er: communication only happens if the topic published Data Data Data Data Data Data Reader Writer Writer Reader Reader Writer by a data writer matches a topic subscribed to by a data reader. Communication via topics is anonymous and Subscriber Publisher Subscriber Publisher transparent, i.e., publishers and subscribers need not be concerned with how topics are created nor who is writ-

Data Store + QoS Mechanisms ing/reading them since the DDS DCPS middleware man- ages these issues. Transport Protocol

Communication Network

Page 2 4/30/2018 Paper Draft

The remainder of this subsection describes the benefits of These parameters can be configured at various levels of DDS relative to conventional pub/sub middleware and cli- granularity (i.e., topics, publishers, data writers, subscrib- ent/server-based SOA platforms. ers, and data readers), thereby allowing application devel- opers to construct customized contracts based on the spe- Topic HISTORY Topic RESOURCE LIMITS R R cific QoS requirements of individual entities. Since the

Data Data Data S1 Data identity of publishers and subscribers are unknown to each Writer Reader Writer Reader S2 R R R R S3 other, the DDS DCPS middleware is responsible for deter- S4 mining whether QoS parameters offered by a publisher are S5 Publisher Subscriber Publisher S6 Subscriber compatible with those required by a subscriber, only allow- S7 ing data distribution when compatibility is satisfied. LATENCY DESTINATION RT Info to Cockpit & S7 S7 S6 S5 S4 S3 S2 S1 Track Processing X Topic FILTER Topic NEW TOPIC DDS Pub/Sub R R Infrastructure COHERENCY Tactical RELIABILITY Network & RTOS Data Data S1 Data Data Writer Writer SOURCE Reader Reader R S2 R R (A) Fewer Layers in the DDS FILTER R (B) DDS QoS Policies S3 NEW Architecture S4 SUBSCRIBER S5 Publisher Subscriber Figure 3: Optimizations and QoS Capabilities of DDS Publisher S6 Subscriber NEW PUBLISHER Figures 3 and 4 show DDS capabilities that make it S7 better suited as the basis of DRE information management than other standard middleware platforms. Figure 3(A) TIME-BASED shows that DDS has fewer layers in its architecture than FILTER conventional SOA standards, such as CORBA, .NET, and J2EE, which significantly reducing latency and jitter, as (A) Moving Processing (B) Subscribing to shown in Section 4. Figure 3(B) shows that DDS supports Closer to the Data Meta-Events many QoS properties, such as Figure 4: Filtering and Meta-event Capabilities of DDS  The lifetime of each data sample, i.e., is it destroyed af- Figure 4(A) shows how DDS can migrate processing ter being sent, kept available during the publisher’s life- closer to the data source, which reduces bandwidth in re- time, or remain for a specified duration after the pub- source-constrained network links. Figure 4(B) shows how lisher shuts down. DDS enables clients to subscribe to meta-events that they  The degree and scope of coherency for information up- can use to detect dynamic changes in network topology, dates, i.e., whether a group of updates can be received as membership, and QoS levels. This mechanism helps DRE a unit and in the order in which they were sent. information management systems adapt to environments  The frequency of information updates, i.e., the rate at that are continuously changing. which updated values are sent and/or received.  The maximum latency of data delivery, i.e., a bound on 2.2 Alternative DDS Implementations the acceptable interval between the time data is sent and The DDS specification defines only policies and inter- received faces between participants.2 To maximize the freedom of  The priority of data delivery, i.e., the priority used by the DDS providers, the specification intentionally does not ad- underlying transport to deliver the data. dress how to implement the services or manage DDS re-  The reliability of data delivery, i.e., whether missed de- sources internally. Naturally, the particular communication liveries will be retried. models, distribution architectures, and implementation  How to arbitrate simultaneous modifications to shared techniques used by DDS providers have a significant im- data by multiple writers, i.e., to determine which modifi- pact on application behaviour and QoS, i.e., different cation to apply. choices affect the suitability of different DDS implementa-  Mechanisms to assert and determine liveliness, i.e., tions and configurations for different classes of DRE in- whether or not a publish-related entity is active. formation management applications.  Parameters for filtering by data receivers, i.e., predicates Table 1: Supported DDS Communication Models which determine which data values are accepted or re- jected. DDS Impl Unicast Multicast Broadcast  The duration of data validity, i.e., the specification of an DDS1 Yes (default) Yes No expiration time for data to avoid delivering “stale” data. DDS2 No Yes Yes (default)  The depth of the ‘history’ included in updates, i.e., how DDS3 Yes (default) No No many prior updates will be available at any time, e.g., ‘only the most recent update,’ ‘the last n updates,’ or ‘all prior updates’. 2 The Real-Time Publish Subscribe (RTPS) protocol was formally adopted by the OMG in June 2006 as the default transport protocol for DDS, but is not yet integrated into the DDS specification.

Page 3 4/30/2018 Paper Draft

An important advantage of DDS is that it allows applica- mediate access to the network can 1) enable us to easily set tions to take advantage of various communication models, up policies for a group of participants associated with the e.g. unicast transport, multicast-capable transport, broad- same network interface 2) allows prioritization of messages cast-capable transports. Different implementations might from different communication channels. support different models. The supported communication A disadvantage of this approach, however, is that it in- models for current DDS implementations we evaluated are troduces an extra configuration step—and possibly another shown in Table 1. DDS1 supports unicast and multicast, point of failure. Moreover, applications must cross extra DDS2 supports multicast and broadcast, whereas DDS3 process boundaries to communicate, which can increase la- only supports unicast. These DDS implementations only tency and jitter. use layer 3 network interfaces and routers, e.g., IP multi- cast, to handle the network traffic for different communica- 2.2.2 Decentralized Architecture tion models, rather than more scalable multicast protocols, The decentralized DDS architecture shown in Figure 6 such as Richocet [25], which combines native IP multicast places the communication- and configuration-related capa- with proactive forward error correction to achieve high lev- bilities into the same user process as the application itself. els of consistency with stable and tunable overhead. These capabilities execute in separate threads (rather than In our evaluation, we also found the three most popular in a separate daemon process) that the DCPS middleware DDS implementations have different architectural designs, library uses to handle communication, reliability, and QoS. as described in the remainder of this section. 2.2.1 Federated Architecture The federated DDS architecture shown in Figure 5 uses a separate daemon process for each network interface. This daemon must be started on each node before domain participants can communicate remotely. Once started, it communicates with daemons running on other nodes and establishes data channels based on reliability requirements (e.g., reliable or best effort) and transport addresses (e.g., unicast or multicast). Each channel handles the communi- Figure 6: Decentralized DDS Architecture cation and QoS for all the participants requiring its particu- lar properties. Using a daemon process decouples the appli- The advantage of a decentralized architecture is that cations (which run in a separate user process) from config- each application is self-contained, without the need for a uration- and communication-related details. For example, separate daemon. As a result, there is less latency and jitter the daemon process can use a configuration file to store overhead, as well as one less configuration and failure common system parameters shared by communication end- point. A disadvantage, however, is that application devel- points associated with a network interface, so that changing opers may need to specify extra application configuration the configuration does not affect application code or pro- details, such as multicast address, port number, reliability cessing. model, and various parameters for different transports. This architecture also makes it hard to buffer/optimize data sent to/from multiple DDS applications on a node, which loses some of the scalability benefits provided by the federated architecture described in Section 2.2.1. 2.2.3 Centralized Architecture The centralized architecture shown in Figure 7 uses a single daemon server running on a designated node to store the information needed to create and manage connections between DDS participants in a domain. The data itself passes directly from publishers to subscribers, whereas the control and initialization activities (such as data type regis- tration, topic creation, and QoS value assignment, modifi- Figure 5: Federated DDS Architecture cation and matching) require communication with this dae- mon server. In general, the advantages of a federated architecture The advantage of the centralized approach is its sim- are that applications can achieve higher scalability to larger plicity of implementation and configuration since all con- number of DDS participants on the same node, e.g., by trol information resides in a single location. The disadvan- bundling messages that originate from different DDS par- tage, of course, is that the daemon is a single point of fail- ticipants. Moreover, using a separate daemon process to

Page 4 4/30/2018 Paper Draft ure, as well as a potential performance bottleneck in a JMS J2EE 1.4 Enterprise messaging standards highly loaded system. SDK/ that enable J2EE components to JMS 1.1 communicate asynchronously & reliably We compared the performance of these pub/sub mech- anisms by using the following metrics:  Latency, which is defined as the roundtrip time between the sending of a message and reception of an acknowl- edgment from the subscriber. In our test, the roundtrip latency is calculated as the average value of 10,000 round trip measurement.  Jitter, which is the standard deviation value to measure the variation of the latency. Figure 7: Centralized DDS Architecture  Throughput, which is defined as the total number of bytes that the subscribers can receive at a unit time in The remainder of this paper investigates how the ar- different 1-to-n publisher/subscriber configurations, i.e., chitecture differences described above can affect the per- 1-to-4, 1-to-8, and 1-to-12. formance experienced by DRE information management We also compared the performance of the DDS listen- applications. er-based and waitset-based subscriber notification mecha- nisms. Finally, we conducted qualitative evaluations of 3 Methodology for Pub/Sub Platform Eval- DDS portability and configuration mechanisms. uation 3.2 Benchmarking Environment This section describes our methodology for evaluating various pub/sub platforms to determine empirically how 3.2.1 Hardware Testbed and Software well they support various classes of DRE information Infrastructure management applications, including system that generate The computing nodes we used to run our experiments small amounts of data periodically (which require low are hosted on ISISLab [19], which is a testbed of computer latency and jitter), systems that send larger amount of systems and network switches that can be arranged in data in bursts (which require high throughput), and many configurations. ISISLab consists of 6 Cisco 3750G- systems that generate alarms (which require 24TS switches, 1 Cisco 3750G-48TS switch, 4 IBM Blade asynchronous, prioritized delivery). Centers each consisting of 14 blades (for a total of 56 blades), 4 gigabit network IO modules and 1 management 3.1 Evaluated Pub/Sub Platforms modules. Each blade has two 2.8 GHz Xeon CPUs, 1GB of ram, 40GB HDD, and 4 independent Gbps network inter- In our evaluations, we focused on comparing the faces. The structure of ISISLab is shown in Figure 8. performance of the C++ implementations of DDS shown in In our test, we used up to 14 nodes (1 pub, 12 subs, Table 2 against each other. and a centralized server in the case of DDS3). Each blade Table 2: DDS Versions Used in Experiments ran Fedora Core 4 Linux, version 2.6.16-1.2108_FC4smp. The DDS applications were run in the Linux real-time DDS Impl Version DDS Distribution Architecture scheduling class to minimize extraneous sources of mem- DDS1 4.1c Decentralized Architecture ory, CPU, and network load. DDS2 2.0 Beta Federated Architecture DDS3 8.0 Centralized Architecture

We also compared these three DDS implementations against the three other pub/sub middleware platforms shown in Table 3. Table 3: Other Pub/Sub Platforms Used in Experiments Platform Version Summary CORBA TAO 1.x OMG data interoperability stan- Notifica- dard that enables events to be tion Service sent & received between objects in a decoupled fashion SOAP gSOAP W3C standard for an XM- 2.7.8 L-based Web Service Figure 8: ISISLab structure

Page 5 4/30/2018 Paper Draft

3.2.2 The DDS Benchmark Environment 4.1 Latency and Jitter results (DBE) Benchmark design. Latency is an important measure- To facilitate the growth of our tests both in variety and ment to evaluate DRE information management perform- complexity, we created the DDS Benchmarking Environ- ance. Our test code measures roundtrip latency for each ment (DBE), which is an open-source framework for au- pub/sub middleware platform described in Section 3.1. We tomating our DDS testing. The DBE consists of (1) a dir- ran the tests on both simple and complex data types to see ectory structure to organize scripts, configuration files, test how well each platform handles marshaling/de-marshaling ids, and test results, (2) a hierarchy of Perl scripts to auto- overhead introduced for complex data types. The IDL mate test setup and execution, (3) a tool for automated structure for the simple and complex data type is shown graph generation, and (4) a shared library for gathering res- below. ults and calculating statistics. // Simple Sequence Type Struct data { long index; sequence data; }

// Complex Sequence Type struct Inner { string info; long index; }; typedef sequence InnerSeq; struct Outer { long length; InnerSeq nested_member; }; typedef sequence ComplexSeq; The publisher writes a simple/complex data sequence of a certain payload size on a particular topic. When the sub- scriber receives the topic data it sends a 4-byte acknowl- edgement in response. The payload sequence length for both simple and complex data ranges from 4 to 16,384 by powers of 2. The QoS and configuration settings for the DDS implementations are all best effort reliability, point- to-point unicast. Figure 9: DDS Benchmarking Environment The basic building blocks of the JMS test codes consist Architecture of administrative objects used by J2EE Application server, The DBE has three levels of execution, as shown in a message publisher, and a message subscriber. JMS sup- Figure 9. This multi-tiered design enhances flexibility, ports both point-to-point and publish/subscribe message performance, and portability while incurring low overhead. domain. Since we are measuring one publisher - one sub- Each level of execution has a specific purpose, i.e., the top scriber latency, we chose point-to-point message domain level is the user interface, the second level manipulates the that uses a synchronous message queues to send and re- node itself, and the bottom level is comprised of the actual ceive data. executables (e.g., publishers and subscribers for each DDS TAO Notification Service use a Real-time CORBA lane implementation). DBE is also implemented carefully to to create a priority path between publisher and subscriber. maximize resource usage by the actual test executables The publisher creates an event channel and specify a lane rather than the DBE scripts. For example, if Ethernet with priority 0. The subscriber will then choose the lane saturation is reached during our testing, the saturation is with priority 0 to make sure that it will receive data from accomplished by relevant DDS data transmissions and not the right lane. The Notification Service then matches the by DBE test artifacts. lane for the subscriber and sends the event to the subscriber using CLIENT_PROPAGATED priority model. . 4 Empirical Results The publisher test code measures latency by timestamp- ing the execution of the data transmission and subtracting This section analyzes the results of benchmarks con- that from the timestamp value when it receives the ack ducted using DBE in ISISlab. We first evaluate 1-to-1 from the subscriber. The goal of this test is to evaluate how roundtrip latency performance of DDS pub/sub implemen- fast data gets transferred from one pub/sub node to another tations and compare them with the performance of non- at different payload sizes. To eliminate factors, other than DDS pub/sub implementations. We then demonstrate and differences among the middleware implementations that analyze the results of 1-to-n scalability throughput tests for affect latency/jitter, we tested a single publisher sending to each DDS implementations, where n is 4, 8, and 12. All a single subscriber running in separate processes on the graphs of empirical results use logarithmic axes since the same node. Since each node has two CPUs the publisher latency/throughput of some pub/sub implementations cover and subscriber can execute in parallel. such a large range of values that linear axes are unreadable Results. Figures 10 and Figure 11 compares the la- as payload size increases. tency/jitter results for simple data type and complex data type under all pub/sub platforms. These figures show how

Page 6 4/30/2018 Paper Draft the latency/jitter of all three DDS implementations is very latency of DDS1 stems largely from its mature implemen- low compared to conventional pub/sub middleware for tation, as well as its architecture shown in Figure 5, in both simple and complex data type. In particular, DDS1 which publishers communicate to subscribers without go- has extremely low latency and jitter compared with the oth- ing through a separate daemon process. In contrast, er pub/sub mechanisms. DDS2’s federated architecture involves an extra hop through a pair of daemon processes (one on the publisher Simple Type and one on the subscriber), which helps explain why its la- tency is higher than DDS1’s when sending small simple data types. These figures also indicate that sending complex data type incurs more overhead for all pub/sub implementa- tions, particularly as data size increases. Since these trans- actions all happen on the same machine, however, DDS2’s federated architecture automatically switches to shared memory mode, which bypasses (de)marshaling. In con- trast, DDS1 performs this (de)marshaling even though it is Complex Type running over UDP loopback. Tests between two or more distinct hosts would negate this DDS2 optimization. Interestingly, the latency increase rate of GSOAP is nearly linear as its data size increases, i.e. if the data size doubles, the latency nearly doubles as well. GSOAP’s poor performance with large payloads stems largely from its XML representation for sequences, which (de)marshals each element of a sequence using a verbose text-based for- mat rather than (de)marshaling the sequence as a whole like DDS and CORBA.

Figure 10: Simple Latency Vs Complex Latency 4.2 Throughput Results Benchmark design. Throughput is another important Simple Type performance metric for DRE information management sys- tems. The primary goals of our throughput tests, therefore, were to measure how well each DDS implementation han- dles scalability, how different communication models (e.g. unicast, multicast, and broadcast) affect performance, and how synchronous (wait-based) and asynchronous (listener- based) perform with these communication models. To maximize scalability in our throughput tests, the publisher and subscriber(s) reside in different processes on different hosts. In the remainder of this subsection we first evaluate Complex Type the performance results of individual DDS implementations3 as we vary the communication models they support, as per the constraints outlined in Table 1. We then compare the multicast performance of DDS1 and DDS2, as well as the unicast performance of DDS1 and DDS3, which are the only common points of evaluation at this point. The following sections will demonstrate our results for throughput. For each figure, we include a small text box with a brief description of the DDS QoS used for the current test. Since we are not able to list the settings for every QoS parameter, any parameter that is not mentioned Figure 11: Simple Jitter Vs Complex Jitter in the box uses the default value specified in the Analysis: There are several explanations for the re- sults in Figures 10 and 11. As discussed in Section 2.1, 3 We did not measure throughput for the other pub/sub platforms DDS has fewer layers than other standard pub/sub plat- because our results in Section 4.1 show they were significantly forms, so it incurs much lower latency and jitter. The low outperformed by DDS. We therefore focus our throughput tests on DDS implementations.

Page 7 4/30/2018 Paper Draft specification [6]. Note that we have chosen the best effort QoS policy for data delivery reliability, as mentioned in section 1. The reason we focus our test on best effort is that we are measuring against the QoS policy for DRE information management where (1) information are updated frequently and (2) the overhead of retransmitting lost data samples is precluded.

4.2.1 DDS1 Unicast/Multicast Results Figures 12 shows the results of our scalability tests for DDS1 unicast/multicast with 1 publisher and multiple subscribers. These figures show that unicast performance degrades drastically when scaling up, whereas multicast scales up without affecting throughput significantly.

Figure 13: DDS2 Multicast vs Broadcast

4.2.3 Comparing DDS Implementation Performance Results. Figure 14 shows multicast performance com- parison of DDS1 and DDS2 with 1 publisher and 12 sub- scribers. Since DDS3 does not currently support multicast we omit it from the comparison. Figure 15 shows unicast performance comparison of DDS1 and DDS3. Since DDS2 does not support unicast we also omit it from this compari- son. Analysis. Figures 14 and 15 indicate that DDS1 out- performs DDS2 for smaller data sizes. As the size of the payloads increase, however, DDS2 performs better. It ap- Figure 12: DDS1 Unicast vs Multicast pears that the difference in the results stems from the dif- ferent distribution architectures used to implement DDS1 Analysis. Figure 12 indicates that the overhead of uni- and DDS2, which are the decentralized and federated ar- cast leads to performance degradation as the number of chitectures, respectively. subscribers increases since the middleware sends a copy to each subscriber. It also indicates how multicast improves scalability since only one copy of the data is delivered re- gardless of how many recipients in the domain subscribe. 4.2.2 DDS2 Broadcast/Multicast Results. Figure 13 shows the scalability test results for DDS2 broadcast/multicast with 1 publisher and multi- ple subscribers (i.e., 4, and 12). This figure shows that both multicast and broadcast scales well as the number of sub- scribers increases and multicast performs slightly better than broadcast.

Analysis. Figure 13 indicates that sending messages to a specific address group rather than every node in the subnet is slightly more efficient.

Figure 14: 1-12 DDS1 Multicast vs. DDS2 Multicast

Page 8 4/30/2018 Paper Draft

Figure 16: Listener-based vs. Wait-based

5 Key Challenges and Lessons Learned This section describes the challenges we encountered when conducting the experiments presented in Section 4 and summarizes the lessons learned from our efforts. 5.1 Resolving DBE Design and Execution Challenges When designing the DBE and running our benchmark experiments, we encountered a number of challenges. Be- low, we summarize these challenges and how we addressed them. Challenge1: Synchronizing Distributed Clocks Figure 15: 1-12 DDS1 Unicast vs. DDS3 Unicast Problem: It is hard to precisely synchronize clocks between applications running on blades distributed 4.2.4 Comparing Wait-based vs Listen- throughout ISISLab. Even when using the Network Time er-based Notification Mechanisms Protocol (NTP) we still experienced differences in time Benchmark design. that ranged from x to y, and we have to constantly repeat Results.. Figure 16 shows the performance compari- the synchronization routines to make sure that the time in sons of listener-based and wait-based notification mecha- different nodes are in sync. We therefore needed to avoid nisms. This figure shows that for DDS1, listener-based no- relying on synchronized clocks to measure latency, jitter, tification outperforms wait-based notification, and for and throughput. DDS2, there is no consistent difference between the two. Solution: For our latency experiments, we simply Analysis. In general, the performance of wait-set have the subscriber send a minimal reply to the publisher, based data receiving and listener-based data receiving and use on the clock on the publisher side to calculate the should be the same because they are waiting for the roundtrip time. For throughput, we use the subscriber’s clock to measure the time required to receive a designated same status condition variable that indicates the arri- number of samples. Both methods provide us with com- val of the data. The reason DDS1 wait-set based data mon reference points and minimize timing errors through receiving slightly trails DDS1 listener-based data re- the usage of effective latency and throughput calculations ceiving might be due to the fact that when data ar- based on a single clock. rives, DDS will attempt to notify the listener first and then the waitset, which might cause a delay for wait- Challenge2: Automating Test Execution set to be triggered. Problem: Achieving good coverage of a test space where parameters can vary in several orthogonal dimen- sions leads to a combinatorial explosion of test types and configurations. Manually running tests for each configura- tion, for each middleware implementation on each node is tedious, error-prone and extremely time-consuming. The task of managing and organizing test results also grows ex- ponentially along with the number of distinct test configu- ration combinations. Solution: The DBE, described in Section 3.2.2, grew out of our efforts to manage the large number of tests and the associated volume of result data. Our efforts to stream- line test creation, execution and analysis are ongoing, and include work on several fronts – a hierarchy of scripts, sev- eral types of configuration files, and test code refactoring. Challenge3: Handling Packet Loss Problem: Since our DDS implementations support the UDP transport, it is possible for packets to be dropped, both at the publisher and subscriber side, so we need to be sure that the subscribers get the designated number of sam- ples in spite of packet loss.

Page 9 4/30/2018 Paper Draft

Solution: There are several ways to address this prob- a sequence, which may be a small as a single byte, while lem. One way is to have the publisher to send the number DDS implementations can send and receive such data of messages that we expect subscribers to receive and stop types as blocks. the timer when the publisher is done. At the subscriber  Individual DDS implementations are optimized for side, we only use the number of messages that are actually different use cases – Figures 10 and 14 indicate that received at the subscribers to calculate the throughput. This DDS1 is optimized for smaller payload sizes compared method has two drawbacks, however: (1) the publisher has to DDS2. As payload size increases, especially for the to send extra notification messages to stop the subscribers complex date type in Figure 10, DDS2 catches up and from listening for further messages, and it is possible that surpasses DDS1 in performance. the subscriber fail to receive this notification message, thus  Apples-to-apples comparisons of DDS implementa- the measurement can never happen, and (2) the publisher tions are hard to make. The reasons for this difficulty controls when to stop the timer, thus creating a distributed fall into the following three categories: clock synchronization problem discussed in challenge1 that  No common transport protocol, e.g., the DDS imple- might affect the accuracy of the evaluation. mentations that we investigated share no common ap- To address the drawbacks of the first method above, plication protocol. DDS1 uses a RTPS like protocol on we adopted an alternative that ensures subscribers will re- top of UDP. DDS2 will add RTPS support soon but ceive an deterministic number of messages by having the not yet, and DDS3 simply implements raw TCP and publishers “oversend” an appropriate amount of extra data.. UDP. Note that this lack of a common transport proto- With this method, we need not have extra pingpong com- col is not a shortcoming of the implementations, but munication between publishers and subscribers. More im- entirely due to the fact that a standard transport proto- portantly, we only need to measure the time interval at the col had not been adopted when the most recent re- subscriber side without the interference of the publisher leases of these implementations were made. side. The downside of this method is that we must run a  No universal support for unicast/broadcast/multicast. certain number of experimental tests before we deciding Table 1 shows the different mechanisms supported by what amount of extra messages to oversend. each DDS implementations, from which we can see Challenge4: Ensuring Steady Communication DDS3 does not support any point to multi-points trans- State port, thus making it hard to scale as number of sub- scribers increase. Problem: We need to make sure our benchmark appli-  DDS applications are not yet portable, which is par- cations are in a steady state when statistical data is being tially due to fact that the specification is still evolving collected. and vendors use proprietary techniques to fill the gaps. Solution: We send primer samples to “warm up” the A portability wrapper façade would be a great help to applications before actually measuring the data. This war- any DDS application developer, and a huge help to our mup period allows time for possible discovery activity re- efforts in writing and running large numbers of bench- lated to other subscribers to finish, and for any other first- mark tests. time actions, on-demand actions, or lazy evaluations to be completed, so that their extra overhead does not affect the  Substantial tuning of policies and options are re- statistics calculations. quired or suggested by vendors to optimize perform- ance, which adds difficulty to designing universal 5.2 Summary of Lessons Learned benchmark configurations, as well as to the learning curve of any application developer. Based on our test results, our experience with develop-  Broadcast can be a double-edged sword – Using ing the DBE and numerous DDS experiments, we learned broadcast can significantly increase performance but it the following: also has the potential to exhaust the network router band-  DDS Performs significantly better than other pub/sub width. implementations – Figure 10 in section 4.1 shows that even the slowest DDS was around twice as fast as non- 6 Related Work DDS pub/sub services. Figure 10 and Figure 11 from Section 4.1 shows that DDS pub/sub middleware scales As an emerging technology for Global Information better for larger payloads compared to non-DDS pub/sub Grid and data-critical real-time systems, DDS in particular, middleware. This performance margin is made possible and publish/subscribe architecture in general, has attracted partly by the fact that DDS decouples the information in- an increasing number of research efforts such as COBEA tent from information exchange. In particular, an XM- [20], Siena [12] and commercial products and standards L-based pub/sub mechanism such as SOAP is optimized (such as JMS [10], WS_NOTIFICATION [13], the COR- for transmitting strings, whereas the data types we used BA Event and Notification services [17]). This Section de- for testing were sequences. GSOAP’s poor performance scribes the growing body of work related to our effort. with large payloads is likely due to the fact that GSOAP Open Architecture Benchmark. Open Architecture requires marshalling and de-marshalling each element of Benchmark (OAB) is a DDS benchmark effort along with

Page 10 4/30/2018 Paper Draft

Open Architecture Computing Environment, an open archi- DDS also provides QoS policies that support content-based tecture initiated by the US Navy to facilitate future combat filter for selective information subscription but it is cur- system software development. Joint efforts have been con- rently limited to syntactic match. It is certainly our interest ducted in OAB to evaluate DDS products, in particular in the future to explore the possibility of introducing se- RTI’s NDDS and THALES DDS, and to understand the mantic architecture into DDS and evaluate their perfor- ability of these DDS products to support bounded latencies mance. and sustained throughput required by combat system appli- cations [8]. Their results indicate that both products per- form closely well and meet the requirement of typical com- 7 Concluding Remarks bat systems. Our DDS work has been complementary to their effort in that we 1) extend the evaluation to other This paper describes and evaluates the architectures of pub/sub middleware 2) focus our scalability tests in three popular implementations of the OMG Data Distribu- throughput. tion Service (DDS). We then present the DDS Benchmark- CORBA Pub/Sub Services. OMG has specified a No- ing Environment (DBE) and how we use the DBE to com- tifi-cation Service [21], which is a superset of the CORBA pare the performance of these DDS implementations, as Event Service that adds interfaces for event filtering, con- well as non-DDS pub/sub platforms. Our results show that figurable event delivery semantics (e.g., at least once or at DDS performs significantly better than other pub/sub im- most once), security, event channel federations, and event plementations because it decouples the information intent delivery QoS. The patterns and techniques used in the im- declaration and the actual information delivery, thereby plementation of TAO’s Real-time Event Service can be improving the flexibility of the underlying middleware and used to improve the performance and predictability of No- leveraging networking and OS mechanisms to improve tification Service implementations. A Notification Service performance. for TAO [22] has been implemented and used it to validate As part of the ongoing Pollux project, we will continue the feasibility of building a reusable framework that factor to evaluate other interesting features of DDS needed by out common code for TAO’s Notification Service, its stan- large-scale DRE information management systems. Our fu- dard CORBA Event Service implementation, and its Real- ture work will include (1) tailoring benchmarks to explore time Event Service. key classes of applications in DRE information manage- PADRES: The Publish/subscribe Applied to Distrib- ment systems, (2) devising generators that can emulate var- uted Resource Scheduling (PADRES) [26] is a novel dis- ious workloads and use cases, (3) including a wider range tributed, content-based publish/subscribe messaging sys- of QoS configurations, e.g. durability, reliable vs. best ef- tem. A PADRES system consists of a set of brokers con- fort, integration of durability, reliability and history depth, nected by overlay network. Each broker employs a rule- (4) designs for migrating processing toward data sources, based engine to route and match publish/subscribe mes- (5) measuring participant discovery time for various enti- sages, and is used for composite event detection. ties, (6) identifying scenarios that distinguish performance S-ToPSS: There has been an increasing demand for of QoS policies and features e.g. collocation applications, content-based pub/sub applications where subscribers can and (7) evaluating the suitability of DDS in heterogeneous use query language to filter the available information and dynamic environments, e.g., mobile ad hoc networks, only receive a subset of the data that is of interest. How- where system resources are limited and dynamic configura- ever, most currently solutions only support syntactic filter- tion changes are common. ing, i.e. matching the information based on the syntactic in- formation, which greatly limits the selectivity of the infor- Acknowledgements mation. In [Error: Reference source not found], the authors investigated how current pub/sub systems can be extended We would like to thank Real-Time Innovations (RTI), with semantic capabilities and proposed a prototype of Prism Technologies (PT), and Object Computing Inc. such a middlewaer architecture called Semantic - Toronto (OCI), particularly Dr. Gerardo Pardo-Castellote, Ms Gong Publish/Subscribe System (S-ToPSS). For a highly intelli- Ke from RTI, Dr. Hans van’t Hag from PrismTech, and gent semantic-aware system, simple synonym transforma- Yan Dan and Steve Harris from OCI, for their extensive tion is not sufficient. S-ToPSS extends this simple model help performing the experiments reported in this paper. We by adding another two layers to the semantic matching would also like to thank Dr.Petr Tuma from Charles Uni- process, respectively concept hierarchy and matching versity for his help with the Bagatel graphics framework. functions. Concept hierarchy makes sure that events (data messages in the context of this paper) that contain general- References ized filtering information does not match the subscriptions 1 Gerardo Pardo-Castellote, Bert Farabaugh, Rick War- with specialized filtering information, and similarly events ren, “An introduction to DDS and Data-Centric Com- that contains more specialized filtering than subscription munications,” www.rti.com/resources.html. match the subscription. Matching functions provide an many-to-many structure to specify more detailed matching 2 Douglas C. Schmidt and Carlos O'Ryan, “Patterns and relations and can be extended to heterogeneous system.s Performance of Distributed Real-time and Embedded

Page 11 4/30/2018 Paper Draft

Publisher/Subscriber Architectures,” Journal of Sys- 15Real-Time Innovation, RTI Data Distribution Service, tems and Software, Special Issue on Software Architec- www.rti.com/products/data_distribution/index.htm. ture -- Engineering Quality Attributes, edited by Jan 16Real-Time Innovation, “Unmanned Georgia Tech Heli- Bosch and Lars Lundberg, October 2002. copter files with NDDS,” 3 Chris Gill, Jeanna M. Gossett, David Corman, Joseph controls.ae.gatech.edu/gtar/2000review/rtindds.pdf. P. Loyall, Richard E. Schantz, Michael Atighetchi, and 17Pradeep Gore, Douglas C. Schmidt, Chris Gill, and Irfan Douglas C. Schmidt, “Integrated Adaptive QoS Man- Pyarali, “The Design and Performance of a Real-time agement in Middleware: An Empirical Case Study,” Notification Service,” Proceedings of the 10th IEEE Re- Proceedings of the 10th Real-time Technology and Ap- al-time Technology and Application Symposium (RTAS plication Symposium, May 25-28, 2004, Toronto, CA '04), Toronto, CA, May 2004. 4 Gerardo Pardo-Castellote, “DDS Spec Outfits Publish- 18Nayef Abu-Ghazaleh, Michael J. Lewis, and Madhusud- Subscribe Technology for GIG,” COTS Journal, April han Govindaraju, “Differential Serialization for Opti- 2005 mized SOAP Performance,” Proceedings of HPDC-13: 5 Gerardo Pardo-Castellote, “OMG Data Distribution IEEE International Symposium on High Performance Service: Real-Time Publish/Subscribe Becomes a Stan- Distributed Computing, Honolulu, Hawaii, pp. 55-64, dard,” www.rti.com/docs/reprint_rti.pdf. June 2004. 6 OMG, “Data Distribution Service for Real-Time Sys- 19DOC Group, DDS Benchmark Project, www.dre.van- tems Specification,” www.omg.org/docs/formal/04-12- derbilt.edu/DDS/html/testbed.html. 02.pdf 20C.Ma and J.Bacon “COBEA: A CORBA-Based Event 7 DOC DDS Benchmark Project Site, www.dre.vander- Architecture,” In Proceedings of the 4rd Conference on bilt.edu/DDS. Object-Oriented Technologies and Systems, USENIX, 8 Bruce McCormick, Leslie Madden, “Open Architecture Apr.1998 Benchmark,” Real-Time Embedded System Work Shop 21Object Management Group, “Notification Service Speci- 2005, fication,” Object Management Group, OMG Document www.omg.org/news/meetings/workshops/RT_2005/03- telecom/99-07-01 ed., July 1999 3_McCormick-Madden.pdf. 22P. Gore, R. K. Cytron, D. C. Schmidt, and C. O’Ryan, 9 Arvind S. Krishna, Douglas C. Schmidt, Ray Klefstad, “Designing and Optimizing a Scalable CORBA Notifi- and Angelo Corsaro, “Real-time CORBA Middleware,” cation Service,” in Proceedings of the Workshop on Op- in Middleware for Communications, edited by Qusay timization of Middleware and Distributed Systems, Mahmoud, Wiley and Sons, New York, 2003 (Snowbird, Utah), pp. 196–204, ACM SIGPLAN, June 10 Sun Microsystems, “J2EE 1.4 Tutorial,” 2001. java.sun.com/j2ee/1.4/docs/tutorial/doc/, December 23Real-Time Innovation, High-Performance distributed 2005 control applications over standard IP networks, 11 Fox, G., Ho,A., Pallickara, S., Pierce, M., and Wu,W, http://www.rti.com/markets/industrial_automation.html. “Grids for the GiG and Real Time Simulations,” Pro- 24Real-Time Innovation, High-reliability communication ceedings of Ninth IEEE International Symposium DS- infrastructure for transportation, RT 2005 on Distributed Simulation and Real Time Ap- http://www.rti.com/markets/transportation.html. plications, 2005 25Mahesh Balakrishnan, Ken Birman, Amar Phanishayee, 12 D.S.Rosenblum, A.L.Wolf, “A Design Framework for and Stefan Pleisch, “ Ricochet: Low-Latency Multi cast Internet-Scale Event Observation and Notification,” 6th for Scalable Time-Critical Services,” Cornell University European Software Engineering Conference. Lecture Technical Report Notes in Computer Science 1301, Springer, Berlin, 26Alex Cheung, Hans-Arno Jacobsen. Dynamic Load Bal- 1997, pages 344-360 ancing in Distributed Content-based Pub lish/Subscribe. 13 IBM, “Web Service Notification,” www- Middleware 2006, December, Melborne, Australia. 128.ibm.com/developerworks/library/specification/ws- 27 Ioana Burcea, Milenko Petrovic, Hans-Arno Jacobsen. notification. S-ToPSS: Semantic Toronto Publish/Subscribe System. 14 PrismTech, “Overview of OpenSplice Data Distribution International Conference on Very Large Databases Service,” www.prismtechnologies.com/sec- (VLDB). p. 1101-1104. Berlin, Germany, 2003. tion-item.asp? sid4=&sid3=252&sid2=10&sid=18&id=557.

Page 12 4/30/2018