Characterization of the Burst Stabilization Protocol for the RR/RR CICQ Switch

1 Characterization of the Burst Stabilization Protocol for the RR/RR CICQ Switch Neil J. Gunther Kenneth J. Christensen and Kenji Yoshigoe Performance Dynamics Company Computer Science and Engineering 4061 East Castro Valley Blvd., Suite 110 University of South Florida Castro Valley, CA 94552 Tampa, FL 33620 [email protected] {christen, kyoshigo}@csee.usf.edu Abstract VOQ Scheduler Input buffered switches with Virtual Output Queueing Input 1 Crossbar 1, 1 (VOQ) can be unstable when presented with unbalanced 1, 2 loads. Existing scheduling algorithms, including iSLIP 1,1 1,2 for Input Queued (IQ) switches and Round Robin (RR) for Combined Input and Crossbar Queued (CICQ) Input 2 2, 1 switches, exhibit instability for some schedulable loads. We investigate the use of a queue length threshold and 2, 2 2,1 2,2 bursting mechanism to achieve stability without requiring internal speed-up. An analytical model is developed to Classifier prove that the burst stabilization protocol achieves Control flow stability and to predict the minimum burst value needed Output 1 Output 2 as a function of offered load. The analytical model is shown to have very good agreement with simulation Figure 1. VOQ input buffered switch results. These results show the advantage of the RR/RR increase in density of VLSI technologies [12, 19, 20, 21, CICQ switch as a contender for the next generation of 26, 27, 28]. high-speed switches. The trade-offs in VOQ switch matrix scheduling are 1. Introduction stability, fairness, and implementation complexity. Stability refers to bounded queue length for scheduable loads (an unstable queue has an unbounded queue High-speed switches are the core of the Internet and length). IQ cell switches use iterative request-grant- make possible end-to-end delivery of packets. These accept scheduling cycles to achieve a maximal one-to-one switches have been designed typically to have their matching. Existing scheduling algorithms for IQ cell packet buffers at the output ports. Output Queued (OQ) switches based on an unweighted maximal matching switches require buffer memories that are N times as fast (such as PIM [1] and iSLIP [16]) are not stable unless as link speed (for N input ports). Link speeds are internal speed-up is used. A 2x speed-up has been proven increasing much more rapidly than memory speeds [11]. to be sufficient for stability for all schedulable flows [5]. In Input Queued (IQ) switches buffer memories need only If a weighted maximal match is implemented based on match the link speed. Thus, IQ switches have been the Oldest Cell First (OCF) or Longest Queue First (LQF), subject of much research for high-speed switching. To stability can be achieved for all schedulable flows without overcome head-of-line blocking in IQ switches, Virtual speed-up [18]. LQF can be unfair since it can cause Output Queueing (VOQ) [1, 16, 23] is employed. VOQ starvation. Weighted matching requires state information switches require switch matrix scheduling algorithms to to be exchanged between input and output ports. CICQ find one-to-one matches between input and output ports. switches can use independent round robin (RR) selection VOQ switches can be IQ, Combined Input and Output of VOQs and CP buffers (this is an RR/RR CICQ Queued (CIOQ), or Combined Input and Crossbar switch). However, instability occurs unless OCF or LQF Queued (CICQ). Fig. 1 shows a generic VOQ switch. is used to select VOQs in an input port [12]. Both OCF CICQ switches have limited buffering at the Cross Points and LQF require comparisons between all N ports during (CP) and have become feasible with the continued each scheduling cycle. This requires either N sequential 1 This material is based upon work supported by the National Science Foundation under Grant No. 9875177 (Christensen and Yoshigoe). comparisons or Log2(N) comparisons with a tree circuit Occupancy feedback containing N −1 comparators. Better methods for Buffered crossbar Input port 1 Cross point polling achieving stability are needed. We address this need. 1, 1 In this paper, Section 2 briefly reviews VOQ switch 1, 2 architectures and their scheduling algorithms. Section 3 1, 3 describes an unstable scheduling region in iSLIP and RR/RR CICQ switches and proposes a burst stabilization Input port 2 protocol to overcome this instability. Section 4 evaluates 2, 1 2, 2 burst stabilization for the RR/RR CICQ switch and 2, 3 develops a model to predict the minimum burst values required to achieve stability. Section 5 is a summary. Input port 3 3, 1 2. Overview of VOQ Switches 3, 2 3, 3 For many years, IQ switches were considered to be an Classifier academic curiosity due to their poor performance caused VOQ polling by head-of-line blocking. The breakthrough in IQ switch architectures occurred when Virtual Output Queueing Output 1 Output 2 Output 3 (VOQ) was invented by Tamir and Frazier [23] in 1988 and then developed by Anderson et al. [1] and McKeown Figure 2. RR/RR CICQ switch [16] in the early 1990’s for cell-based packet switching. and are less complex than IQ switches with bufferless In a VOQ IQ switch, each input buffer is partitioned into crossbars. In [27] the design of a 10-Gbps RR/RR CICQ N queues with one queue for each output port (hence the switch using off-the-shelf FPGA technology is presented. name “virtual” output queueing). IQ cell switches with bufferless crossbars use iterative 3. Unstable Regions in VOQ Switches request-grant-accept scheduling cycles to achieve a maximal matching of inputs to outputs. In Parallel In this section, we consider an unstable region in the Iterated Matching (PIM) [1] the steps are: iSLIP and RR/RR CICQ switches. We assume that 1. Each unmatched input sends a request to every = arrivals are Bernoulli with rate λij for i, j 1, , N output for which it has a queued cell. where i is the input port number, j the output port 2. If an unmatched output receives any requests, it ≤ ≤ number, and 0 λij 1.0. A Bernoulli model is a grants to one input by randomly selecting a request common traffic model for studying switch performance among those requesting to this output. (e.g., as used in [1, 12, 16, 18]). The Bernoulli model is 3. If an input receives a grant, it accepts one by described in Section 4.1. randomly selecting an output among those granted to this output. Definition 1 Let λ1 be the offered load at port 1. The In iSLIP [16], accept and grant counters are maintained in fraction f of offered traffic going to: each input and output port, respectively, and result in a randomized matching at high utilization due to a “slip” = 1. VOQ11 is fλ1 λ11 between counters. Iterated matching algorithms require − = 2. VOQ12 is (1 f )λ1 λ12 control flow between all input and output ports. This coupling between ports adds complexity to a switch where 1/ 2 < f < 1. implementation. IQ switches with limited buffering in the crossbars – = + Corollary 1 λ1 λ11 λ12 CICQ switches – became feasible to implement in the late 1990’s [21, 27, 28]. Scheduling in a CICQ switch can be ≠ − = Remark 1 Note that: λ12 1 λ11 unless λ1 1 . The RR for both the VOQs within the input ports and for the = −1 mean interarrival time of cells at port 2 is τ2 λ2 . But CP buffers within the crossbar to achieve an RR/RR ≡ = λ21 λ2 by virtue ofλ22 0 . CICQ switch as shown in Fig. 2. Each CP buffer needs capacity to hold only two cells. For CICQ switches, In [12] a region of instability for iSLIP IQ and RR/RR control flow can be reduced to only requiring knowledge CICQ switches is demonstrated for a schedulable, at the input ports of crossbar buffer occupancy. CICQ asymmetric traffic load to two ports. For any two ports switches thus uncouple input and output port scheduling arbitrarily identified as ports 1 and 2, let λ = λ + λ λ = λ λ = 1 1 11 12 , 21 12 , and 22 0. Within a region of λ > 0.95 11 0.5 and high offered traffic load, instability occurs. This instability condition is not limited to a two-port 0.9 Schedulable line switch, but can occur between any two ports of a large 0.85 0.8 switch. The offered load which causes instability for an Border (CICQ) 0.75 11 RR/RR CICQ switch ranges from a low of approximately λ 0.7 < λ < 0.9 in the range of 0.6 11 0.7 to a high of 1.0 at 0.65 Border (iSLIP) λ = λ = Unstable region is 11 0.5 and 11 1.0 . This instability exists for a 0.6 between border and switch of size N input and output ports where any two of 0.55 schedulable line. the N ports have the traffic load specified above. The 0.5 instability range for an iSLIP IQ switch is larger in area. 0.45 The simulation models developed and validated in [26] 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 λ are used to reproduce the instability results in [12]. 12 Infinite size VOQ buffers are assumed for both an iSLIP Figure 3. Instability in RR/RR CICQ and iSLIP IQ and CICQ switch. For the iSLIP IQ switch, four iterations are used per scheduling cycle. Each VOQ has a cell burst counter which decrements As an experimental means to detect instability, on consecutive cell transfers (from the VOQ).

Characterization of the Burst Stabilization Protocol for the RR/RR CICQ Switch

Instability Phenomena in Underloaded Packet Networks with Qos Schedulers M.Ajmone Marsan, M.Franceschinis, E.Leonardi, F.Neri, A.Tarello

Scalable Multi-Module Packet Switches with Quality of Service

Improving the Scalability of High Performance Computer Systems

Fair Scheduling in Input-Queued Switches Under Inadmissible Traffic

Scheduling Algorithm and Evaluating Performance of a Novel 3D-VOQ Switch

On Scheduling Input Queued Cell Switches

Performance Analysis of Networks on Chips

Switch Kenji Yoshigoe University of South Florida

Quantifiable Service Differentiation for Packet Networks

Fabric-On-A-Chip: Toward Consolidating Packet Switching Functions on Silicon

Data Path Processing in Fast Programmable Routers

Msc THESIS Design of a High-Performance Buﬀered Crossbar Switch Fabric Using Network on Chip