Guaranteed Periodic Real-Time Communication Over Wormhole

Guaranteed Perio dic Real-Time Communication over Wormhole Switched Networks Alejandro Garcia, Lisb eth Johansson, Magnus Jonsson, and Mattias Weckstén Scho ol of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden [email protected], http://www.hh.se/ide ket, e.g., Myrinet [1] and Gigabit Ethernet [2]. How- Abstract ever, these networks typically have no or very little supp ort for real-time trac, esp ecially hard real-time In this paper, we investigate how to eciently im- trac which is required in applications like those men- plement TDMA (Time Division Multiple Access) on a tioned ab ove. Networks like ATM are available but wormhole switched network using a pure software solu- less complex aordable alternatives are needed where tion in the end nodes. Transmission is conict freeon eachnode can be connected directly to the switched the time-slot level and hencedead lock free. On the sub- network. slot level, however, conicts are possible when using early sending, a methodwepropose in ordertoreduce In this pap er, we present work done on time- latency while stil l not hazarding the TDMA schedule. deterministic communication to supp ort cyclic traf- Wepropose a complete system to oer services for dy- c in a class of switched networks. By using TDMA namic establishment of guaranteed periodic real-time (Time Division Multiple Access), the access to each virtual channels. Two dierent clock synchronization link in the network is divided into time-slots. When approaches for integration into the TDMA system are the trac is changed (e.g., a new real-time virtual discussed. Implementation and experimental studies channel, RTVC, b etween two no des is requested), the have been done on a cluster of PCs connected by a mapping of trac onto links and time-slots is resched- Myrinet network. Also,acase study with a radar sig- uled in a distributed manner. Since clo cksynchroniza- nal processing application is presented to show the us- tion messages are scheduled onto the same network ability. A best-case reduction of the latency of up to and no scheduling is done in the switches (only in the to 37 percent for 640 Byte messages by using early end no des), the real-time supp ort can b e implemented sending in Myrinet is shown in the case study. Source purely in software. Also worth mentioning is that the routed wormhole switching networks are assumed in network b ecomes totally deadlo ck free since the whole the work but the results are applicable on some other path b etween source and destination is reserved in the categories of switched networks too. same time-slot. Weassumewormhole switched networks in the pa- p er but the concept holds for cut through and store- 1 Intro duction and-forward switching to o. However, the overhead can b ecome rather high in store-and-forward networks due to high latency compared to the eective sending Switched high-p erformance networks are commonly time. This latency must b e encountered b efore a new used for lo cal area networks and interconnection net- time-slot and the sending of a new message can be- works for parallel and distributed computing systems gin. Moreover, source routing or another deterministic of to day and tomorrow. Examples include clusters of routing metho d is assumed. In this way, it is p ossible workstations or PCs running multimedia applications, to reservethe corresp onding links of a path between and parallel computers for radar signal pro cessing ap- source and destination. Since switched systems allow plications. Anumber of networks with a comp etitive for concurrent transmissions, multiple such paths can price/p erformance ratio have app eared on the mar- Time slot Clock-sync.TDMA Cycle n Clock-sync. TDMA Cycle n+1 Data Time Slot B Slot marginal Slot marginal Time B=clock skew safety margin Figure 2: TDMA cycle when the clo cksynchronization Figure 1: The TDMA slot. is separated from the rest of the data trac. b e reserved in the same time-slot. presented and discussed in Section 2. In Section 3, our We prop ose a metho d called early sending which Myrinet implementation is describ ed, and a case study can be used in, e.g., wormhole networks. By this is presented in Section 4. The pap er is then concluded metho d, a no de with a scheduled slot S is allowed i+1 in Section 5. to initiate sending already in slot S . Expression for i the exact time in slot S where it is allowed to initiate i sending is given in Section 2.3. In a case study based 2 Time deterministic communication on a radar signal pro cessing application on a system concept with Myrinet, we show that the latency in the b est- case can b e reduced byupto37percentby using early To pass messages with hard real-time constrains sending. By doubling the message size from 640 Byte over a generic switched network, a metho d is needed to 1280 Byte, the b est-case improvement is 90 p ercent. to guarantee bandwidth. In order to allow transmis- For the early sending metho d, we assume some form sion of multiple data-streams over a shared media, it of low-level ow control as used in wormhole networks. is p ossible to use time domain multiplexing combined Some work has b een done in the eld of switched with reservation of every single network link in the networks with supp ort for hard real-time trac. Ex- system. This works only if all no des has an unied amples of suchwork are discussed b elow. RACEway apprehension of the time. In the following sections we is a switched network primarily develop ed for emb ed- will discuss the supp ort for p erio dic trac with hard ded systems [3] [4]. It has supp ort for real-time traf- real-time constraints. Further information related to c by the use of priorities but dynamic establishment Section 2 and 3 can b e found in [12]. of RTVCs with guaranteed p erformance is not sup- p orted. A similar system as the one discussed in this 2.1 TDMA and clo ck synchronization pap er, but on a circuit-switched HIPPI network, is presented in [5]. In this pap er however, we fo cus on If the no des in the network have large clo ck drifts more ne grained TDMA schedules and investigate the margins in the slots (Figure 1) need to b e large in how, e.g., clo ck synchronization accuracy inuence on order to prevent blo ckages, but large slot margins gives p erformance and other parameters. a low network utilization. The alternative is a more There are a lot of work rep orted on how to supp ort frequent clo ck synchronization to keep down the clo ck real-time trac by mo difying the hardware and/or drift. The margins can b e reduced or totally removed software in the switches (see, e.g., [6] [7] [8] [9]). In if the switches are able to handle blo cking situations contrast, in our work wehave assumed no changes to without removing any message from the network. A either software or hardware in the switches. Instead, message that starts its transmission a short time (rel- it is a pure software solution which only aects the atively to the slot length) b efore it is allowed to, will end no des. Instead of reserving access to the network, b e held up if the needed links are o ccupied with pack- as in our case, one approach to get real-time services ets b elonging to the previous TDMA slot (see Section over a standard switched network is to calculate the 2.3). worst-case latency. However, the worst-case through- Two dierent approaches for the creation of the put can be very low when a high worst-case latency TDMA cycle have b een considered in this work. The separates each guaranteed access to the network [10] rst approachhave the clo cksynchronization part sep- [11]. arated from the rest of the TDMA cycle. However, this The rest of the pap er is organized as follows. leads to a minimum length of the TDMA cycle (Figure TDMA, clo ck synchronization, and early sending are 2) in order for the clo ck synchronization to reapp ear TDMA Cycle n TDMA Cycle n+1 Mn Slot Time S 1 S 2 Sn Figure 3: TDMA cycle when the clo cksynchronization is scheduled among with other data packets. Figure 4: Clo cksynchronization in a 4 4 mesh. at certain intervals. As the clo ck synchronization p e- rio d o ccupies a continuous p erio d of time, when no other trac is allowed, the minimum time p erio d for a maximum clo ck drift of 1s, a clo cksynchronization data packets will b e aected. The time p erio d has to p erio d of 5000 s is needed as describ ed in Section 3. b e larger than the total duration of clo ck synchroniza- Previous in this section two dierent approaches for tion trac. creating the TDMA-cycle where discussed. Using the rst metho d (i.e., the metho d with the clo ck synchro- The other approach considered, schedules the clo ck nization separated from the TDMA-cycle) the time to synchronization messages among with all other real- synchronize the whole network is calculated to b e as time messages in the network (i.e., logical channels are follows.

Guaranteed Periodic Real-Time Communication Over Wormhole

End-To-End Performance of 10-Gigabit Ethernet on Commodity Systems

Parallel Computing at DESY Peter Wegner Outline •Types of Parallel

Data Center Architecture and Topology

Comparing Ethernet and Myrinet for MPI Communication

Designing High-Performance and Scalable Clustered Network Attached Storage with Infiniband

Inside the Lustre File System

High-End HPC Architectures

Analysis and Optimisation of Communication Links for Signal Processing Applications

Comparative Performance Analysis with Infiniband and Myrinet-10G

GSN the Ideal Application(S) More Virtual Applications for HEP and Others Some Thoughts About Network Storage

State-Of-The-Art Network Interconnects for Computer Clusters in High Performance Computing

Presentation