A Distributed Fair Queuing MAC Scheduler for Wireless ATM Network

Wing-Chung Hung

A thesis submitted in conforrnity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Cornputer Engineering University of Toronto

O Copyright by Wing-Chung Hung 1997 395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1A ON4 Canada Canada Your fi& Votre référence

Our file Norre réldrence

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothéque nationale du Canada de reproduce, loan, distribute or sel1 reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/fïlm, de reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts from it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Scheduler for Wireless ATM Network

WingChung Hung

Master of Applied Science, 1997 Graduate Department of Electrical and Computer Engineering University of Toronto

Abstract

A centraiized wireless ATM LAN consists of a base station and multiple mobile stations. Since they share one cornrnon transmission medium, medium access control (MAC) is needed to coordinate the order of transmission. To maintain features of BISDN, the MAC has to consider a connection's while ailocating resources. One of the proposed MAC protocols for providing wireless multiservices is called distributed fair queuing (DFQ). DFQ is a TDMA-based MAC scheme which dynamically performs per-ceïi based on the celi's priority. Success of implementing DFQ largely depends on scheduluig efficiency. A software scheduler wili not be fast enough for this real time task. Therefore, an efficient hardware scheduler is required. The major theme of this thesis is to design such a scheduler. As a second theme, it considers issues of WATM LAN system design and proposes a system architecture for implernenting MAC functions using existing commercial ATM cards. I would like to thank my supervisor Prof. Leon-Garcia for providing guidance on the research direction and providing assistance, suggestions and consultation for problems throughout the research. 1 would also like to thank many fellow students in the Communication Group who provided valuable assistance in various ways: Richard Kautz and Seyed Mohammed Ali Arad, who workeci on the same project, provided many materials and references on the wireless ATM; Massoud Reza Hashemi had many inputs on the design of the wrap sequencer; Massoud Hadjiahmad provided much information on hardware parts suitable to be used in building our MAC board; Keith Chow provides information on the ATM products. 1 would also like to thank Prof. Paul Chow and his student Vineet Joshi for implementing and testing the Wrap Sequencer in their FPGA board.

iii Abstract ...... ii Acknowledgment ...... iii List of Tables ...... vi List of Figures ...... vii List of Acronyms ...... viii

1 . Introduction ...... 1.1. Thesis Background ...... 1.2. Project Background ...... 1.3. Thesis Organization ......

2. ATM and Wireless ATM ...... 2.1. ATM Network ...... 2.1.1. ATM Adaptation Layer and Services ...... 2.1.2. ATM Layer ...... 2.1.3. Physical Layer ...... 2.1.4. ATM Cell ...... 2.1.5. ATM QoS Enforcement ...... 2.2. Wireless ATM ...... 2.2.1 . WATM Reference Model ...... 2.2.2. WATM LAN Topology ...... 2.2.3. QoS of WATM Services ...... 2.2.4. Distributed Fair Queuing (DFQ) ...... 2.2.5. Scheduler in DFQ System ......

3. Virtual Time Bounding ...... 3.1. Leaky Bucket Flow Control ...... 3.2. Virtual Finish Time Bounding ...... 3.2.1. General Process Sharing and DFQ ...... 3.2.2. GPS Maximum Backlog ...... 3.2.3. GPS Virtual Time ...... 3.2.4. SCFQ Virtual Time ...... 3.3. VirtuaI Time Granularity ...... 3.4. v muai 1 ime Bwunuing ror ury scneauler ...... 3.5. Summary ......

4 . Wrap Sequencer ...... 4.1. Generic Sequencer and Wrap Sequencer ...... 4.2. Key (Virtual Time) Wrap Around Problem ...... 4.3. Solutions to the Key Wrap-Around Problem ...... 4.3.1 Key Rotation Algorithm ...... 4.4. Wrap Sequencer and Key Rotation Aigorithm ...... 4.4.1 Wrap Sequencer Design ...... 4.5. Wrap Sequencer Synchronization ...... 4.6. Wrap Sequencer Components and Operation ...... 4.6.1. Wrap Sequencer Entities ...... 4.6.2. Source-Sink Switching ...... 4.6.3. Key Rotation ...... 4.6.4. Source Blocking ...... 4.6.4.1. Global Blocking Signals ...... 4.6.4.2. Local Blocking Signals ...... 4.7. Sequencing Elements ...... 4.7.1. An Example ...... 4.7.2. Sequencing Elements in General ...... 4.8. Size of Sequencer ...... 4.9. Summary of Features of Wrap Sequencer ......

5 . DFQ Scheduler ...... 5.1. DFQ Scheduler for WATM MAC Scheduling ...... 5.2. DFQ Scheduler ...... 5.2.1. Receive FIFO and Gate Controller ...... 5.2.2. Recycle FIFO. Recycle Controller and Wrap Sequencer ..... 5.2.3. Sequencer-FIFO In terface ...... 5.3. Scheduler Parameter and Performance ...... 5.3.1. Parameter of Recycle FIFO and Wrap Sequencer ...... 5.4. Cascade Schedulers ...... 5.5. Summary ......

6. Wireless ATM Bridge ...... ,...... JY 6.2. Advantages of the "Decomposed" Mode1 ...... 53 6.3. WATM Bridge Components ...... 54 6.3.1.ATMAdapter ...... 55 6.3.2.MACBoard ...... 55 6.3.3. Part List ...... 56

7 . Conclusion and Future Works ...... 57 7.1. Scheduler ...... 58 7.2. WATM Bridge ...... 58

Appendix A: Wrap Sequencer Design and Schematics ...... 59 AppendixB: References ...... 89 Table 3.1. Granularity factors and distinguishable service share ...... 21 Table 3.2. Surnmary of an example ...... 22

Table 4.1. Sequencer States in one element-clock period ...... 31 Table 4.2. Master Entity source blocking signds (when head of path is blocked) 37 Table 4.3. Wrap Sequencer operation illustration ...... 39 Table 4.4. The general sequence ordering of Wrap Sequencer ...... 40

Table 6.1. WATM bridge part list ...... 56

vii Fig .1 . 1 . Wireless ATM network ......

Fig.2.1. ATM protocol reference mode1 and layer functions ...... Fig.2.2. Characteristics of ATM services ...... Fig.2.3. ATM UNI and UNI ce11 format ...... Fig.2.4. WATM reference model and WATM LAN topology ...... Fig.2.5. DFQ system queuing model ......

Fig.3.1. Leaky bucket mode1 and departure function ...... Fig.3.2. System model for GPS analysis ...... Fig.3.3. Universal departure cuve of GPS system ...... Fig.3.4. GPS and SCFQ arrival or departure function ......

Fig.4.0. The concept of a generic sequencer ...... Fig.4.1. Sequencer element format and synchronization state ...... Fig.4.2. MAC board data path and Wrap Sequencer functionality ...... Fig.4.3. Key rotation algorithm ...... Fig.4.4. Wrap Sequencer interface signals and cascaded sequencers ...... Fig.4.5. Wrap Sequencer architecture and entities ...... Fig.4.6. Key Rotation implementation ...... Fig.4.7. Illustration of a Master Entity in case 4 ......

FigS.1. DFQ scheduler and Wrap Sequencer ...... Fig.5.2. Gate Controller architecture ...... Fig.5.3. Recycle Controller architecture ...... Fig.5.4. Sequencer-FIFO interface ...... Fig.5.5. Cascaded schedulers ......

Fig . 6.1. WATM implementation .. the "Compact" mode1 ...... Fig.6.2. WATM implementation .. the "Decomposed" mode1 ...... Fig.6.3. WATM bridge structure and components ......

viii AAL ATM Adaptation Layer ABR Available Bit Rate ATC Address Translation Controller ATM Asynchronous Transfer Mode BISDN Broadband ISDN CAC Cal1 Admission Control CBR Constant Bit Rate CDMA Code Division Multiple Access CDVT Ce11 Delay Variation Tolerance CLD Connectionless Data CLP Ce11 Loss Pnority CS Convergence Sublayer DFQ Distributed Fair Queuing DLC Data Link Control FDMA Frequency Division Multiple Access FIFO First In Fist Out GFC Generic Flow Control GPS General Process Sharing HEC Header Error Control HOL Head of Line ISDN Integrated Services Digital Network LAN Local Area Network MAC Medium Access Control N-IC Network Interface Card NN-I Network Network Interface NTC Network Termination Card OAM Operation, Administration and Maintenance PDU Protocol Data Unit PGPS Packe tized GPS PHY Physical layer PM Physical Medium PTI Payload Type Indicator QoS Quality of Service KNL Kaaio Access Layer SAR Segmentation and Reassemble SCFQ Self-clocked Fair Queuing TC Transmission Convergence sublayer TDMA Time Division Multiple Access UBR Unassigned Bit Rate UNI User Network Interface UPC Usage Parameter Control UTP Unshielded Twist Pair cable VBR Variable Bit Rate VCI Virtual Channel Identifier VPI Virtual Path Identifier WATM Wireless ATM W-PHY Wireless PHY Introduction

1.1. Thesis Background As Asynchronous Transfer Mode (ATM) technology becomes more mature for both local and wide area networks (LANlWAN) and wireless communication gains popularity, it is natural to think of extending the ATM features to wireless terrninals. In recent years, many activities were focused on the research and development of wireless ATM (WATM). In 1996, ATM Forum formed a WATM working Group to develop specifications for both "mobile ATM" and ATM-based wireless access. Many different WATM architectures have been proposed [3,4,5] and testbeds for experimental trial are available [7,8]. The major desirable attributes of WATM are its ability to support multimedia applications with quality of service (QoS) guarantees and its ability to seamlessly interwork with other wireline ATM networks. The major challenges for achieving such attributes in the wireless segment are mobility management, QoS medium access control (MAC), and reliability of wireless link. This thesis discusses a scheduler which addresses the QoS MAC issues. The scheduler schedules al1 buffered cells on a cell-by-ce11 basis according to their QoS indication value, and allows the highest prionty ce11 to access the wireless medium. A WATM system architecture is also proposed and discussed. The major characteristic of the architecture is to use existing wireline ATM network interface card (MC) to construct

Fig.l.1. Wireless ATM Network

Medium access control and radio

Base station d Remote station ATM switch Mobili ty management IxI .. .L - A.. * .*W. The thesis work is part of a WATM project led by Professor A. Leon-Garcia at the University of Toronto.

1.2. Project Background In 1995, Professor Leon-Garcia at University of Toronto and Professor Chiang at the Singapore National University started a joint project on WATM LAN. The project is jointly funded by Ontario and Singapore. Its objective is to study WATM MAC, QoS provision in the wireless link, and wireless/wireline interworking issues. Besides developing theory, it is planned to implement a WATM LAN so that experimental results can be obtained. Currently, the University of Toronto is in a position to set up such a LAN as a testbed to study the implementation issues and to evaluate the perfonnance of proposed solutions for WATM network. This thesis is a part of the work towards its implementation. The WATM LAN is TDMA-based and is planned to operate in 5 GHz band with data rate up to 25 Mbps. It should support broadband multimedia services, and should seamlessly interwork with wireline ATM networks. The proposed topology is a centralized LAN where a base station coordinates all traffic flows. Al1 traffic must go through the base station. Direct communications between remote stations and using a remote station as a relay are not supported. As part of the project, a MAC protocol called Distributed Fair Queuing (DFQ) [9,10] has been developed. The basic concept of DFQ is to mode1 the LAN which consists of a central base station and several remote stations as a distributed queuing system. The centralized MAC coordinator in the base station gathers QoS requirement and queuing status of each virtual connection from remote terrninals. To ensure the negotiated QoS for each connection is maintained, the base station schedules their transmission order based on the urgency of each ce11 on a cell-by-ce11 basis. The scheduling scheme ensures that each connection will have guaranteed data transfer rate. The DFQ implementation difficulty is due to the cell-by-ce11 scheduling. In a real time environment, ce11 priority assessment, together with other ce11 manipulation functions, must be completed within one ce11 transmission time. Therefore, the scheduler studied in this thesis has to be efficient and should not be CPU intensive.

1.3. Thesis organization In the subsequent chapters, a brief background on ATM and WATM will be introduced in Chapter 2. In the same chapter, requirernents on the WATM network and me Liry IvinL prorocol wili De alscussea. unapter 3 1s a mer sumrnary 01. generaï processor sharing (GPS), self-clocked fair queuing (SCFQ) and fair queuing implementation issues. The major objective of this chapter is to determine the number of virtual time bits required. Chapter 4 presents an algorithm to sohe virtual time wraparound problems and shows a Wrap Sequencer architecture which implements the algorithm. The schematic of the Wrap Sequencer irnplementation is included in Appendix A. The Wrap Sequencer is a building component of DFQ scheduler whose architecture is present in Chapter 5. Chapter 6 shows a proposed WATM LAN architecture. Some specific parts available for building WATM bridge will be listed. Chapter 7 is the conclusion and further works for implementing the LAN. ATM and Wireless ATM

ATM technology provides networks for broadband integrated services. Wireless ATM LANs intend to provide the same features to wireless users. However, due to the physical limitations of wireless link, quality of ATM features over wireless link can hardly match that over fiber links. Therefore, the goal of the WATM network is to provide sirnilar features qualitatively while QoS is being degraded quantitatively. The overall WATM transport architecture is based on the ATM protocol reference model with appropriate extensions added to adapt to wireless link characteristics. This chapter discusses the features of ATM networks and some functions required for WATM LAN.

2.1. ATM Network The term Asynchronous Transfer Mode (ATM) is first defined in the 1988 1-series standard recomrnendation from ITU-T (CCITï). By the definition in the recornrnendation, ATM is "a transfer mode in which the information is organized into cells; it is asynchronous in the sense that the recurrence of cells containing information from an individual users is not necessarily periodic." Its asynchronous nature and per- connection QoS control promise a new networking architecture. Instead of building specialized networks, ATM is a way of building an integrated services network that is intended to satisfy requirements of different types of services.

2.1.1. ATM Adaptation Layer and Services The complete ATM reference model consists of a user plane, a control plane and a management plane. Fig.2.1 shows the user plane and the corresponding protocol data unit (PDU) format and functions performed at each layer. Independent of the service, al1 traffic in an ATM network is transported in 53-byte cells. However, the applications in the upper layer are not necessarily generating ATM cells, but different forrns of packets. To send those packets across the ATM network, hyerl 1 Function

1 Convergence 1 Segmentation and reassembly Genenc flow control ATM Ce11 header generation/extraction Ce11 VPWCI translation ~AHI ceii pnyload j LAHI ceii payload 1 1 Ce11 multiplex and demultiplex -. . . ,_..-..-• 1 Ce11 rate decoupiing /.-• HEC sequence generatiodverification Ce1 delineation Tmsmission headaptation Transmission Frarne generationhecovery - H/CF CS header/tniler Bit timing HIST: SAR header/tder Physical medium they will have to adapted to the ATM ce11 format. This adaptation is performed by the ATM Adaptation Layer (AAL). Currently, six AAL types (AALO to AAG5) have been defined for different classes of services which are classified by connection mode, timing and error tolerance. Charactenstics of different service classes and their relationship to the six AAL types are shown in Fig.2.2. As AAL is an adaptation layer between the ATM layer and the upper layer, it exists only at end nodes and is further divided into two sub-layers: Convergence Sublayer (CS) and Segmentation and Reassemble (SAR) sublayer. CS makes sure that the upper layer application receives the needed services. Typically, header and trailer are added for end-to-end error protection. SAR segments CS-PDU and composes the 48-byte ATM ce11 payload, and does the reverse in the receiver. The operations are specific to different AAL types. Furthemore, each AAL type has different means for dealing with lost or mis-insertion cells, error and timing recovery. Details of different AAL header/trail fields and operations can be found in many different ATM books[1,2].

2.1.2. ATM Layer ATM layer is the hem of ATM switches and is responsible for the transfer of al1 user aaffic in the ATM network. The functions performed at this layer distinguish the ATM network as an integrated services network. Its main function is to switch cells to their destination based on a virtual connection identifier and the associated QoS established during cal1 setup. The identifier, hierarchically, consists of a Virtual Path CBR VBR VBR ABR UBR CLD real-time non-red-time AAL Type AAL- 1 AAL-2 AAL-5 AAL-O AAL-3/4 Connection Co~ection-Oriented Connection Mode 1 -1ess Cell Loss Specified Unspecified Rate Cell Transfer Specified 1 Unspecified Delay 1 Cell Delay Specified Unspecified

Variation.. . . ~lowConmoi ( NO I yes I NO

CBR: Constant Bit Rate UBR: Unspecified Bit Rate VBR: Variable Bit Rate CLD: Connectionless Data ABR: Available Bit Rate

Identifier (VPI) and a Virtual Channel Identifier (VCI). Many VCs may make up a VP and many VPs may share one physical link or port. An ATM switch may have multiple input ports and multiple output ports. For each incorning cell, the switch performs a mapping ( Port,, VPI,, VCI, ) ==> (Por&, VPL,, VCL, ). Therefore, identifiers of a virtual connection are changing dong the path and translation tables must be maintained in each ATM switch. However, there are VP switches which perform switching based on VPI only and leave VCI unchanged. In case that more than one incorning ce11 goes to the same output port, the order of transmission should be based on their QoS. If the transmission medium is a TDMA- based wireless link, there is only one output port (broadcast radio) and al1 cells have to be scheduled for accessing the medium. ATM networks are broadband networks. They require high operating speed which is maxirnized by using short headers, small and fixed-size cells, pre-defined paths and no link-by-link error recovery. An ATM switch processes ce11 header only, and relays the payload of ATM cells without processing. This simplifies and, thus, speeds up the handling of cells. It accepts a ce11 from the transmission media, performs a quick check of the validity of the header, and lookups and translates the identifier in the header and then sends the ce11 forward to the next node. The intended transmission medium is optical fiber which is highly reliable and almost error-fiee. This has eliminated the need for a complicated protocol to detect and recover from errors between successive switching nodes. Thus, error recovery is an end-to-end rather than a link-by-link process. 2.1.3. Physical Layer The physical layer relays ATM cells over physical links. Besides the obvious function of sending and receiving bits, it is also responsible for other functions as shown in Fig.2.1. ATM cells are transmitted serially link by link through the network. However, they are not transported as individual cells, but as fiames. One of the cornrnon frames being used is the SONET fiame. It is the transmission convergence (TC) sublayer that properly loads the ATM cells into the physical transmission frames on the sending side and removes them on the receiving side. Frames are sent at a constant rate. ATM supports variable transmission rates by transrnitting the necessary number of cells per unit time. In case there is no traffic to fil1 up the frame, idle celIs will be inserted for celI rate decoupling. These idle cells are generated and terminateci in the Physical layer and will not be seen by the ATM layer. Header error control is a physical layer function which implies that erroneous or mis-inserted cells will terminate at the physical layer as well. Al1 these physical layer functions are not fit for wireless links. They must be modified to reflect the unreliability and scarcity of bandwidth of the link, and scarcity of power in mobile terminais.

2.1.4 ATM CELL ATM ce11 consists of 53 bytes which includes a 5-byte header and a 48-byte payload, as shown in Fig.2.3. The 5-byte ATM ce11 header contains al1 the information the network needs to transfer the ce11 from source to destination over a pre-established ATM virtual connection. The virtual path identifier (VPI) and virtual channel identifier (VCI)together form the connection identifier used for switching and multiplexing. The payload type indicator (PTT) indicates whether the ce11 contains user data or network OAM information. The ce11 lost priority (CLP)field indicates the eligibiiity of the ce11 for discard by the network under congested conditions. If CLP=l, the ce11 has a lower priority and can be discarded by the network if necessary However, since discarded cells will be retransmitted, ce11 discarding should be the last resort to congestion control and should be avoided as much as possible. The header error control (HEC) field is an error checking field for the first four bytes of ce11 header. It offers single error correction and double error detection. MultipIe ATM UNI Cell Fonnat ATM NNI Ceii Fonnat GFC 1 VPI 1 VPI i VPI VCI 5-byte t VPI I VCI I VCI Header t VCI- -- 1 VCI I m 1 CLP VCI I PT1 1 CLP HEC HEC I 48-byte Payload I I 48-byte Paylond I

UNI: User-Network Interface VCI: 16-bit Virtud Co~eCti0nIdentifier NNI: Netwrok-Netwrok Interface PTI: 3-bit Payload Type Indiwtor GFC: Cbit Genenc Flow Control CLP: 1-bit Ce11 Lost Pnority VPI: 8112-bit Vimial Path Identifier HEC: 8-bit Header EmrControl bit errors, in excess of two, in a 40-bit header may be interpreted by the receiver as a single-bit error and correction procedure will apply. Consequently, VPWCI may be changed and the ce11 may be mis-delivered. This is the origin of mis-inserted cells on ATM network connections. Since most optical fiber errors are single bit errors, this scheme is optimal. However, it may lose effectiveness on media with burst error charactenstics. As such, wireless ATM requires more error checking than HEC. The first four bytes of ATM header are generated in the ATM layer. When the ce11 is passed to Physical layer, these four bytes will be used to calculate the HEC in the Physical layer. As HEC only protects the header to rninirnize or avoid mis-delivery, it is the end node's responsibility to ensure the correctness of the payload. There are two types of user cells, namely, UNI and NNI cells. The UNI ce11 takes away four bits from VPI and uses them for generic flow control (GFC). When congestion occurs, GFC may be used to slow down sources involved in the congested link. For the time being, however, the use of GFC has not been specified and is always coded to 0000. Without a standard mechanism for flow control, an ATM network can only implement preventive congestion control based on Cal1 Admission Conaol (CAC) and Usage Parameter Control (UPC).

2.1.5. ATM QoS ENFORCEMENT ATM QoS enforcement is achieved by using two control algorithms which are CAC and UPC. When a connection setup is requested, a call setup procedure is invoked to set up the connection. This procedure invoives routing, CAC, and resource allocation. The CAC algorithm determines whether there are sufficient resources in the network to establish the call at its required QoS and to maintain the negotiated QoS of the already -Y----'------, ---'---Y -- ---.------mu ----Y---"-- ..-Y--. -1. w-v-. llVYI enroute are updated to include the new VPVVCI UO mapping. This setup procedure is signaled through a dedicated connection in the same network. During a connection's Iife time, input cells from the user to the network are controlled by UPC to ensure that the negotiated parameters are not violated. This enforces each connection's QoS. The leaky bucket is a rate-based flow control mechanism which can implement UPC and will be discussed in Chapter 3.

2.2. Wireless ATM Al1 the conveniences and challenges of WATM system arise from two aspects, namely, the characteristics of wireless link and terminal mobility. WATM LAN can be considered as a single ce11 WATM system where mobility is lirnited to within the cell. This implies that the work is concentrated on extending wireline ATM features over wireless link to non-mobile remote terminais.

2.2.1. WATM Reference Model A WATM reference model is shown in Fig.2.4. Sometimes, the Data Link Control (DLC), MAC and Wireless Physical Layer (W-PHY)in the model are referred to as Radio Access Layer (RAL). The issues in the W-PHY are operating frequency, data rate, modulation techniques, and interference and power control. A high speed digital transceiver typically implements those W-PHY functions. These functions are discussed in [21]. The DLC layer ensures reliable transmission over a wireless link. Error detection, retransmission and forward error correction (FEC) are the techniques and may be used differently for different classes of services. In the initial implementation of the WATM

Fig.2.4. WATM reference model and WATM LAN topology

Remote UPP~~ Station AAL ATM Station 1 1 DLC 1 MAC

Remote Station Y1 bA1, AC AU UUUULLLVU CLLUI C..V TTYYAVUU KAAUX AU AVAAUUAY VJ yLUWAI.6 LLLALVILIAUU YAWOW LW VUVA1 other and, therefore, implementation of DLC functions will be deferred. When it is time for implementing DLC functions, most likely they will be implemented in software. The MAC layer is used to coordinate channel sharing. TDMA is a preferred MAC layer technology for WATM and is the basis of many MAC proposals [11,13]. Dynarnic time-slot assignment of TDMA fits for bandwidth-on-demand, broadband ce11 switching system with QoS control. FDMA is unsuitable due to its channelized nature and CDMA is technically immature for broadband systems due to the required high chip rate. The MAC protocol to be implemented in the proposed WATM LAN is a TDMA- based Distributed Fair Queuing [9,10] proposed by Richard Kautz at University of Toronto, and will be described in Section 2.2.4.

2.2.2. WATM LAN Topology A star configuration has been selected for the wireless ATM LAN. It consists of a base station at the center of the star and multiple remote stations communicating with the base stations which acts as master for scheduling and polling. Remote stations do not cornmunicate with each other directly. This centralized control configuration is more suitable for WATM LAN than a distributed one. Most importantly, the base station has the knowledge of remote stations' traffic and can allocate bandwidth dynarnically according to their requirements. This is cntical for maintaining per-connection QoS. Secondly, when a wireless segment interconnects with wireline segments, the base station can served as the access point efficiently. In such case, there is a large volume of traffic which flows in from external sources especially when surfing the web or when servers are located in the wireline segment. Traffic between peer remote stations will be much less than those between a remote station and the wireline segment. Therefore, it is expected that downlink traffic volume will be much larger than uplink traffic volume. Scheduling for this kind of traffic pattern can be performed more efficiently with a centralized configuration. Thirdly, it uses the channel more efficiently because traffic is free from coilisions. Fourthly, the base station has the power to handle more complex tasks and, thus, relieve the power requirement on the remote station. The disadvantages of centralized topology are its limitations on portability, robustness and scaleability. - Support for multimedia traffic over the wireless segment is a major goal of the WATM LAN. Therefore, the MAC protocol should support dynamic bandwidth allocation for each connection according to the instantaneous requirements and should do so without violating other connections' allocated capacity. Fair queuing 115,161, which is a protocol derived fiom general processor sharing (GPS) [17], has the required properties and its pnnciple can be used for MAC scheduling. In [9], a Distributed Fair Queuing (DFQ) MAC protocol is proposed. Unlilce other protocols, such as IEEE 802.1 1 [14] and Multiservices Dynamic Reservation (MDR)[5], DFQ does not use fiames to group data fiom different classes of services and there is no reservation. Each ce11 is independently scheduled and each time-slot is independently allocated to a data unit called a capsule, which is a term used in [9] to refer to the WATM MAC PDU. Each capsule encapsulates an ATM ce11 with header and trail. The trail is for error detection and correction while the header is used for polling and virtual time synchronization within a LAN. Virtual time is a mechanism used in fair queuing to track the systern's progress and to identify cell pnonty among al1 backlogged cells. Consider a queuing system as shown in Fig.2.5a. Each queue i has an attribute cal1 virtual time increment AFi and each ce11 k in the queue has an attribute called virtualfinish time F:. Al1 backlogged ceUs are served in ascending order of their virtuaI finish time. When the server is serving a cell, the virtual finish time of the ce11 becomes the system virtual time Fs. When ce11 k arrives at a backlogged queue, its virtual finish time is that of its previous ce11 plus virtual time increment of the queue. i.e. F~~= F:-' +Di. (2- 1a) However, if ce11 k arrives at an empty queue, its virtual finish time will be the system virtual tirne plus the queue's virtual tirne increment. i.e. F;=FS +AF~. (2- 1b) In both cases, F+O. Since Fs is always less than or equal to the virtual finish time of any backlogged cells, the two equations can be combined and become F: F: = O, and (2-2a) ~?=rnax(~:-', Fs) +AFi; M. (2-2b) For one particular backlogged queue, the k-th ce11 always has a larger virtual finish time than the (k-1)-th cell. Therefore, the server just has to examine the head of line (HOL) of each queue to determine which ce11 is to receive service next. This is a cell-by-ce11 dynamic scheduling algorithm. Relative priority of each queue is reflected in its virtual time increment where a smaller value means a higher priority. Consider each station and broadcastmg the system virtuai time. Alternatively, it can be said that remote stations report their queuing status to the base station. New stations that would like to join the LAN wait for a special poll. After that, the reporting mechanism starts to work. When being polled, a remote station transmits a capsule and piggy-backs the virtual finish time of the next capsule in the queue. Therefore, the scheduler in the base station will have the HOL information, which is enough for scheduling. Polling information is in the header of the downlink capsule. It specifies a remote station identifier (the one being polled) and the virtual finish time of the expected uplink capsule. This information is available to al1 remote stations due to the broadcast nature of wireless medium. The virtual finish time will be the uplink system virtual time for the equation (2-2). This reporting mechanism works well for backlogged queues, but how about empty queues? When a ceIl anives at an empty queue, its virhial finish time cannot be reported to the scheduler in the base station and, therefore, will not be polled. There are different ways of solving this problem, but the basic pnnciple is to poll a remote station even though its queue is empty. One approach is to predict the virtual finish time of the next &val ce11 when a remote station sends the last capsule. Scheduling is, then, performed based on the predicted value. From the viewpoint of the base station scheduler, predicted value and backlogged value are just the same. There are many ways the DFQ can be modified and implemented. Therefore, when we are considering the system design of a MAC board, flexibility is an important issue. The MAC board discussed in Chapter 6 will have an embedded fair queuing scheduler for scheduling. However, the algorithm of reporting and polling will be implemented in fumware or software and can be modified as needed.

2.2.5. Scheduler in DFQ System As shown in Fig.2.5c, each remote station implements a Scheduler which maintains a real-time global priority queue for alI uplink cells from that station. The base station maintains another Scheduler for the HOL of the queues of the remote stations. Al1 the Schedulers use a cell's virtual finish time F~~as the priority measurement. The design and implementation of such scheduler will be discussed in Chapter 4 and 5. As the basis of DFQ are GPS and self-clocked fair queuing, they will be discussed in the next chapter. Virtual Time Bounding

The DFQ MAC scheme described in the previous chapter combines arrived cells Çom al1 connections into a global queue. Even though the virtual finish time of a particular connection is a non-decreasing function with respect to ce11 arrival time, that of the global queue is not. This is because the next ce11 which arrives at the queue can be from any one of the active connections. In other words, the virtual finish time of the k-th ce11 amving at the global queue is not necessarily greater than or equal to that of the (k- 1)-th arrived cell. However, if traffic from al1 connections are policed by leaky bucket flow control mechanism, virtual finish time of the global queue's next arrival cell will be bounded. This chapter assumes that is the case and will find the virtual finish time bounds.

3.1. Leaky Bucket Flow Control The leaky bucket is a rate-based flow control mechanism and is a way of implementing Usage Parameter Control. Each leaky bucket, as shown in Fig.3.la is characterized by four parameters, namely buffer size Nb, bucket size O, token arrival rate y and service peak rate p. Buffer size controls the amount of data which can be buffered when the incoming rate is temporarily in excess of the outgoing rate. It has an effect on ce11 loss ratio and average queuing delay. Bucket size o allows a connection to stock up quotas for the next burst of cells and is used to control the maximum burst size. Token arrival rate y regulates the long term average data rate to be less than or equal to y. Peak rate p imposes an upper bound on service rate. A leaky bucket operates as follows. Tokens arrive at the token bucket at a constant rate y. If the bucket is full, the token will be discarded. Otherwise, it is accumulated in the bucket. Each data unit requires a token to receive service. The peak service rate is controlled by the Spacer which allows one data unit to be serviced in every period T where T is inverse of the peak rate p. Fig.3. lb plots the leaky bucket departure function Dp(t) for a greedy connection whose arrival rate is higher than the peak service rate. At the beginning when there are Peak Rate

Departure for t S tl Bucket size function ,(t) = d /" i

Token srrival rate te . Y

a Leakv Bucket Mode1 b. Leakv Bucket De~artureFunction tokens in the bucket, the connection is served at the peak rate p. After the tokens are used up at tl, it is served at the average rate y. Therefore, the departure function becomes D&) = Pt for t 5 tl, and (3- la) Dp(t) = a + yt for t > tl (3- 1b) At tl, number of data units served equals to the number of tokens arrived. That is pt1 = 0 + yt1 tl = 0 l (p-y). (3-2) If the peak rate is infinite, the departure function becomes D1Ilf(t)=u+y (3-3) which is greater than Dp(t) for t < tl. After then, the two functions are identical. As shown in Fig.3.2, applications at the end node generate traffic which rate is controlled by a leaky bucket and enters the global queue of a MAC scheduler. Therefore, the departure function of the leaky bucket is the arriva1 function of the scheduler. For sirnplicity, Da(t), instead of Dp(t), is used as the arriva1 function of the scheduler. Since this study is focused on the scheduler, the size of the leaky bucket buffer is not important and is assumed to be infinite. When a connection is regulated by such a leaky bucket, which has infinite buffer size and infinite peak rate, it is said to be @,y)-regulated.

3.2. Virtual Finish Time Bounding With n virtual time bits, virtual time can have values ranging from O to 2"- 1. If n is too small, the number will be used up too soon, and the size of the queue will be very lirnited. For example, if n=10 and there is only one connection with AF=4, the queue size will be limited to 256 (21°/4)at most. To avoid ce11 lost due to buffer overflow, large n is desirable whenever possible. Virtud time cdculator Work consewing r semer Scheduler VTC Globai Queue

Departure function: For each connection: // Leaky Bucket: (~i,Y, ) 1 Arrivai function: Ai(t) Service share: +i

We assume that the scheduler allocates only 32 bits to each buffered cell's information. These 32 bits contain the cell's virtual finish time and its buffer address. The assumption is made by considering that the scheduler cornmunicates with other components using a 32-bit data bus, so that each cell's information can be transfer to/fiom the scheduler in one writelread cycle. The assumption also implies that these 32 bits are shared resources and their usage should be optirnized. For example, if n=20, the remaining 12 bits will represent at most 4096 (212)buffer addresses. At any time, al1 locally stable connections have a limited number of backlogged cells. Each of them is stamped with a virtual finish time value. Among al1 those cells, at time t, the largest virtual finish time is denoted as Fmm(t)and the smallest one as F-(t). These two values form the upper and lower bounds of virtual time at t. Denote the maximum width between the two bounds as FB where FB = max,( Fm,(t) - F-(t) ). (3-4) To guarantee that the scheduler to be discussed in Chapter 5 will work properly, it is necessary that 2"-1 > FB. (3-5) In other words, the width of virtud time bounds has to be bounded and n has to be sufficiently large. This Chapter is dedicated to finding Fg and n in equation (3-5).

3.2.1. General Processor Sharing and DFQ By combining General Processor Sharing and Leaky Bucket rate control, we can fairly allocate link bandwidth to different connections according to their subscribed QoS. GPS analysis assumes a fluid-flow traffic mode1 where the server can serve multiple connections simultaneously. These assumptions are not the case for a packetized network traffîc. Therefore, Packetized GPS (PGPS) was introduced for packetized network L Y 1 -. Y It determines a cell's virtual finish time at the cell's arrival tirne, and does not require a reference GPS model. That's what the "self-clocked" means. Therefore, the irnplementation is easier and simpler. The DFQ discussed in the previous chapter is an application of SCFQ in a distributed environment.

3.2.2. GPS Maximum Backlog The system model used for the analysis is shown in Fig.3.2. There are a set of connections X sharing one link of transmission capacity C. Each connection i, i~ X, is rate regulated by a leaky bucket with parameter (a;, yi). At time t, there are ~i(tjtokens available in the token bucket. Al1 arrived traffic will enter a global queue waiting to be served by a work conserving server. The amount of anived trafîic of connection i from time O to time t is denoted by Ai(t) where Ai(t) I ~i + yi t. (3-6) For the system to be globally stable, it is necessary that the total arriva1 rate is less than the departure rate. That is Cyi

3.2.3. GPS Virtual Time Each connection in the GPS system is assigned a service share qi. If the outgoing link capacity is C and sum of al1 connections' share is a unity, the service rate of connection i will be ri where ri = C (3- 14) Connection i is locally stable if it satisfies yi < ri. Connections in a globally stable system are not necessarily locally stable. In order to guarantee that each backlogged connection receives at least service share $i, independent of the other connections, it is necessary that x$i51. (3- 15) There are two sets of active connections: the backlogged connection set B(t) and the empty connection set E(t) where E(t) is the complement of qt). Therefore, qt) u E(t) = X. (3- 16) If (3-15) is satisfied, a backlogged connection is guaranteed at least service share Qi because its actual share is $i/Eqkr and an empty connection is guaranteed at least service share y.JC which is less than or equal to $i because it is assumed to be locally stable. Therefore, the time varying guaranteed service share can be expressed as $i(t) = Qi, if itz æ(t) (3- 17a) @(t)= n /C, if i~ E(t) (3- 17b) Let the queue of connection i become empty at ei. It has been proven [17] that for a globally stable system, a set of N active connections can be arranged in the order such that eo = 0, and el Ie2 Ie3 I... 5 e~ = &. The total system departure in the period [O,tb) is C&. During the period that the set of backlogged connections g(t) remains unchanged, each of those connections i will receive an amount of service Di(% ek+t) = C(t-er) $i(t) / Qj(t),ek 5 t ek+i . (3- 18) When æ(t) changes, the slope of Di(t), kX, will change accordingly and aDi(t) / at = C $i(t) 1 C $j(t). (3- 19) Universal curve tracks system progress. corresponds to system virtual time

aAi(t)/at = < ri Backlogged set aDi(t)/& = ri/C@i(t)1 ri g(t) changes. aV(t)/at = I/Qi

Therefore, Di(t) is a piece-wise linear function where the corners occur when the set of backlogged connections changes. When a connection becomes empty at time e,the slope of Di(t) of al1 other backlogged connections increase because they have a higher service rate and more work can be done. The maximum slope of the arriva1 function of a greedy connection is yl, Iri, and the minimum slope of departure function of backlogged connection is ri / C $i 2 ri. The GPS system is fair in the sense that al1 busy connections receive the same normalized amount of work with respect to their service share. Therefore, for any two busy connections i and j, U(t) = Di(t) / $i(t) = Dj(t) / $j(t)- (3-20) As such, U(t) is an universal curve representing the progress of the system. Fig.3.3 shows the universal curve and anivd function nomalized to the service share. In the diagram, there are three connections which become empty at el, e2 and es, respectively. The virtual time V(t) is a function that tracks the progress of the GPS system, and is defined as V(0) = 0, and (3-2 1a) V(tj +2)=V(tj) + T/ xjEx$i(t); O

3.2.4. SCFQ Virtual Time In a SCFQ system, arrivais and departures are staircase, rather than piecewise linear, functions. The rising edge of a stair occurs at the time an event occurs. For an arrival function, an event occurs when the last bit of a packet is received while for the occurs when the last bit of a packet Packet mives/depm departs. Fig.3.4 depicts the relationship function at the rising edge of the of such functions in the GPS and SCFQ staircase. The run of stair is the inter-packet systems. Consider the function as a tirne. departure function, which is the amount * of work done for a connection. It time shows that the virtual time of the connection in SCFQ lags that in GPS by, at most, one packet (one stair step). In our implementation, such difference will not affect the system QoS performance significantly because the packet size is small. In a SCFQ system, each arriva1 ce11 is assigned a virtual finish time which is the expected virtual time for the ce11 to receive service. When a ce11 is being served, its virtual finish time becomes the system virtual time, which tracks the system's progress. The virtual finish time F: of k-th packet of connection i is stamped when the packet arrives and is calculated as F~O = O, and (3-23a) F,"=~~X(F~-',F~)+AF~,k=1,2,3 ,... (3-23b) For greedy connections, when 1 5 k 2 q, FsS F:". Therefore, F,L = ~t-l+ mi; = k AFi; k=1,2,3,...,0i (3-24) As maximum potential backlog occurs at ta+for al1 connections, the maximum virtual time bound width Fg also occurs at t=O+ where Fe = maxi( Fp - F~O ) = maxi( F?) - O = maxi( ~i AFi ). (3-25) When the stairs of virtual time in SCFQ are joined to form a piecewise linear function, its slope will be Mi. As pointed out in section 2.2.4, a mapping function is needed to translate QoS parameters into the virtual time increment AFi. One particular mapping function we used is to have AFi inverse-proportional to the required bandwidth which, in tm,is proportional to its service share Qi. Therefore, the function can be as simple as mi = G/$i, (3-26) and consequently, Fg = maxi( G ~i/$~) (3-27) where G is a granularity factor which determines the granularity of virtual finish time. connections have the same scale factor G, the absolute value of G will not affect the service ordering at all. However, it has an effect on the number of distinguishable QoS levels. A finer granularity (larger G) can support more QoS levels as will be shown in next section.

3.3. Virtual Time Granularity In DFQ implementation, virtual finish time F and virtual time increment AF are integers. In order to differentiate pnonties between connection i and j, it is required that AFi # AFj LGlQiJ+LG/$j J Jf$i*$j* (3-28) To guarantee that the requirement is satisfied when Qj = $i + 8, where @>Oand 8>0, it is necessary and sufficient that the integer parts of G/ei and G/$i are different. Equation (3- 29b) satisfies this requirement stated in equation (3-29a). Thus, it follows that

LG/AJ-LGI~J>~ , (3-29a) G/$i-G/($i+6) 2 1 (3-29b) G>$?/S+$~. (3-29~) For example, if $i = 0.80 and -.O 1, G 2 0.8~10.01 + 0.8 = 64.8. As shown in Table 3.1, if G=64.8, distinguishable service shares are 0.80 and 0.81, but not any values in between such as 0.805. However, if G=128.8, 0.800 and 0.805 are distinguishable services shares. Let $k the k-th distinguishable service share and defined qk = G / (G+k-1); k=1,2,3, ... (3-30) If G=l, for G/$ to be integers, the distinguishable service shares $ are { 1, 112, 113, 114, ... ) and, accordingly, possible data rates are{C, C12, C/3, ... ). Connections whose data rate is between C and Cl2 cannot be differentiated from the ones with data rate C and C/2. When G=2, the distinguishable $ are { 1,213, 112, 215, 113, ... ). From (3-30) and the two example values, a change of granularity factor G has more impact on large-share connections. Large G is good for providing more distinguishable data rates for different types of services. The trade off is that it requires more bits for representing virtual finish tirne. a - - - - time bits (26=64)than another system using G= 1.

3.4. Virtual Time Bounding for DFQ Scheduler Chapter 4 will present a scheduling algorithm which identifies an accessible region. The minimum space of that region is 2"-', where n is the number of virtual time bits. This means that for any two cells in such scheduler, their virtual time difference must be less than 2"-'. That is Fe < 2"". (3-31) From (3-27) and (3-31) 2"-' > maxi( GoJ$ ) n > maxi( logz(G~i/Qi) ) + 1. (3-32) Suppose that there are two connections whose bandwidth requirements are different by a factor of 10,000. For example, cellular voice and HDTV requrie 0(lo3) kbps and 0(107) kbps, respectively. Therefore, the higher QoS connection has AFl = G/q1 while the lower QoS connection has AF2 = 10,000 G&. To satisfy the bounding condition in equation (3-31) and from (3-32) 2"' > max( GG~/$~,10,000 ) n > max( log2( Go& )+l,logz( 10000G~21411 )+1 ) For example, QI= 0.8 and G=32, n = max( logz( 40 01 ) + 1, logz( 4x 1O' ) +1 ) = max( 6.3 + ~O~~(CT~),19.6 + log2(02) ) If n=20, the maximum potential backlog of each connections before the bounding condition is violated will be 01 = L 220".3 1= 13307 02 = L 220-19-6J = 1 The example is summarized in Table 3.2.

Table. 3.2. Summary of an example r C=25 Mbps, G=32, n=20 Connection Data Rate Service AFi Max. ~i No te i ri Share qi 1 20 Mbps 0.8 40 13307 High QoS 2 2 Kbps 0.8~104 400,000 1 Low QoS 3.3. bummary This chapter makes two points. First, when traffic sources are regulated by a lem bucket, the virtual time of the next arrived ce11 will be bounded. Secondly, if virtual time bounds are given, required virtual time space can be deterrnined. This information is required to ensure that the Accessible Region (to be discussed in Chapter 4) of the DFQ scheduler is large enough for proper scheduhg. Wrap Sequencer

The DFQ MAC scheme described in Chapter 3 uses virtual finish tirne to distinguish the relative priority of ATM cells. Therefore, a scheduler is needed to schedule order of transmission according to the celis' virtual fmish tirne. The major difficulty of DFQ scheduling is caused by to the virtual time wrap-around problern. This chapter discusses the wrap-around problem and proposes a solution, which is practically implemented in the Wrap Sequencer. The architecture and operating algorithm of the sequencer will be presented in the following sections. In the next chapter, we will illustrate an application of the Wrap Sequencer in an WATM LAN MAC Scheduler. Schematics of the proposed Wrap Sequencer can be found in Appendix A.

4.1. Generic Sequencer and Wrap Sequencer A sequencer is a sorting device that sorts elements, either in ascending or descending order, according to a key field of the elements. A sequencer element, in this thesis, is a 33-bit data unit which consists of three fields: the valid bit, the n-bit key field and the (32-n)-bit data field as shown in Fig.4.1. In the case of DFQ, the key will be the virtual fuiish time of a cell and the data will be a pointer which points to the ceU's buffer address. Therefore, key and virtual time are interchangeable in this chapter. A generic sequencer [22] in Fig.4.0 consists of an array of units called entities. Each entity compares two incoming elements. The one that has a larger key will move forward through Q while the other one moves backward through CQ. The aggregate

Fig.4.O. The concept of a generic sequencer

[ ~enericSequencer 1

D Q Sequencer Entity 32 31 32-n 31-n O

Sequencer Element a-bit Key (32-$-bit Data LI i 1-bit Valid work of the array of entities will put elements in ascending or descending order depending on the design of the Comparator. This type of generic sequencer is not suited for DFQ Scheduler because the DFQ elements has the wrap-around problem, which wiii be formally stated in the next section. In brief, a DFQ element has higher priority if it has a smaller key. However, when key wrapping occurs, the relationship does not hold. If a generic sequencer is used and key wrapping occurs, the element ordering wiü be wrong. Therefore, we design a wrap sequencer which is a sequencer capable of handling the key wrap-around problem. The data path of incoming cells in a MAC board that implements DFQ is shown in Fig.4.2. Each incoming ceii will be starnped with a virtual frnish tirne F, by a Virtual Time Calculator, which determines their relative priority among al1 backlogged ceb. The ceil is then stored in the Buffer in address A. An element consisting of (F,A) is then sent to the DFQ Scheduler. The highest priority element will be served by the MAC PDU Composer which uses the A to retrieve the corresponding cell from the buffer. In the base station, this occurs when the downlink is available for transmitting a cell. In a remote station, it occurs when the station is king polled. The Wrap Sequencer is the main component of the scheduler which wili be discussed in the next chapter. Elements arrived at the sequencer are not in priority order while elements depart from it are in priority order.

4.2. Key (Virtual Time) Wrap-Around Problem For a particular connection, the virtual finish the of cell k, vkiS a non-decreasing function with respect to ce1 arrival the. Therefore, the value keeps on increasing. However, in the WATM LAN implementation, the virtual fmish the of ceil k is a n-bit value p. Therefore, when FI reaches it's maximum value, it wraps around. Thus, @ = vkmod 2n. (4- 1) The scheduler should schedule elements according to V. Unfortunately, V is not available to the scheduler. Instead, the schedukr has to perform scheduling based on F. The requirement of the scheduler is that, given F, it should be able to maintain correct I J [ BUFFER 1

t I 1 ATM ! virtual -rime * B DFQ dmu RF ' 1 .* Cells ; Calculata Scheduler Composer ' i ' Transceiver l ./ I ! I 334t sequenc.w-*' -.m., .... element 1' 1

> 32-bit Wrap Sequencer is a extemal F eIement scheduler component

Element~Vial ElementwVirtuai The * Time * Element Amvai Tie Element Departure Time

service order as if V were given even when wrapping occurs. In other words, if vk< 2", and vk+j2 2" for j>0, the scheduler should conclude that l?+ F* even though the face value of l?j is less. For example, let n=4, if vk= 14 and v"' = 17, according to equation (4-l), F* = 14 and l?' = 1. The scheduler should conclude that p' > p. i.e. 1 > 14. The Key Wrap-Around problem for the scheduler is stated as follow: When the key of a ce11 reaches the maximum value, it wraps around. Scheduling based on the face value of the wrupped key cannot order queued cells properly when wrapping occurs. Since the source of the problem is wrapping, we corne up with a sequencing algorithm which virtuaily "prevents" wrapping to occur and thus solves the wrap-around problem. Fig.4.3 shows the sequencing algorithm which is called key rotation algorithm or virtual time rotation algorithm.

4.3. Solutions to the Key Wrap-Around Problem There are different propos& for solving the problem. One is to avoid the wrapping by resetting the key when the system becomes idle. Since the average data arriva1 rate is assumed to be less that the data departure rate, the system will become idle when the queue is empty. However, this approach wiil fail unless the length of a traffic -- - estirnates the next cers virtual finish time, the system wiii never be in idle state. Therefore, this is not applicable. Another approach use round-robin scheduler [10]. The system maintains M FIFO queues. A ce11 with virtual finish time F is sent to queue m where m=FmodM. (4-2) A round-robin scheduler serves the queues in numerical order fiom 1 to M. When a queue becomes empty, the scheduler moves on to the next queue. This scheduling scheme has two disadvantages. First, the system has to maintain a large number of queues. Second, celis in a queue are served according to their arrival order but not to their relative priority. Our proposed solution, which is caïled Key Rotation Algorithm, aggregates all arrival celis into one global queue. It solves the key wrapping problem by preventing the occurrence of wrapping. In [23], the authors patented a way of handling wrap around problem which is sirnilar to the solution to be discussed. Both schemes inverts the most significant bit of key when wrapping occurs. However, in [23] there are only two elements to be compared. In the case of DFQ scheduler, the number of elements to be scheduled is in the order of hundreds or thousands. Therefore, different approach is taken.

4.3.1. Key Rotation Algorithm In Fig.4.3, virtual thne is represented by a n-bit value F. Therefore, the virtual tirne space is {0,1,..., 2"-1). Assume that there exists a Wrap Sequencer which can sort elements according to F in ascending order if wrapping does not occur. Fs is the system virtual time, which is the virtual fmish tirne of the ceU king served. From the Wrap Sequencer's point of view, Fs is the virtual the of the latest element that exited the sequencer through port Q'. Fmiiis the virtual the of the lowest priority elcment in the sequencer, and Fm, is that of a new element trying to enter the sequencer. In the diagram, virtual the space is represented by a virtual clock. The clock has two hands, the Head hand which points to Fs and the Trail hand which points to FM. AU elements in the sequencer have virtual the in between Fs and The service order is always from F, to Fd1 sequentially. The Head hand divides the clock into two regions, the Inaccessible Region whose space is {O,1 ,..., Fs} and the Accessible Region whose space is {FS+l, Fs+2,...,2n-1). If Fncw

'The meaning of "port Q will be described in Section 4.4 and 4.6. - and inserted into the appropriate position. Otherwise, it will be blocked until it falis uito the Accessible Region when Fs moves forward. This algorithm ensures that Fhl always stays in the Accessible Region Both the Head and Trail hands are dynamic. Fs increases when the head element of the sequencer departs, and Fdl increases when a larger Fm, is accepted. Therefore, as Fs

Fig.4.3 Key Rotation algorithm Pseudtxode of Key Rotation algaithm */ :s=Fhl=O Unacceptable :m= 2'- J FE, naccessible Region = ( ) rccessible Region = { O, ...Fm)

case( new elemeni arrives ) if( Fnew in Accessible Region ) accepc new elemenl into sequencer inserl( new elernent) if Fnew > Ftrail Fhl=Fnew end if

' else block new element from getting inIo sequencer end if end case

case( head eiements departs ) Fs = virtual dme of the departeci element if Fs >= (Fmax+l)n rotates fotward al1 F by (Fmax+ l)/î a. Virtual time sDace and regions end if end case Inaccessible Region: (0,1, F,- 1) ..., if( Fs =0 ) Accessible Region: (5,..., 2"-1) Inaccessible Region = ( } F,, F,, will be accepted if and only if it is in else Inaccessible Region = {O,..,Fs-1 ) Accessible Region. end if FdI is always in Accessible Region. Accessible Region = ( Fs,..Fmax) When Fs2 2"-', rotate forward all elements' F by half circle.

Rotate keys by 18O0(half arcle) - -- - Referring to the diagram, wrapping occurs when Fhi is smaller than F. It is desirable to keep the Accessible Region to a reasonable size and to "prevent" the occurrence of wrapping. Therefore, whenever Fs is greater than or equal to 2", the hands will rotate forward clockwise by a half circle, as shown in Fig.4.3b. This rotation imrnediately expands the Accessible Region and guarantees that its size is always at least half of the circle, which is 2"'. In Fig.4.3b, F, is originally in the Inaccessible Region. After the rotation, the Accessible Region expands andF,, falls in the Accessible Region. Since all elements in the Inaccessible Region are blocked from entering the sequencer, elements inside the Wrap Sequencer are all in the Accessible Region. Consequently, Fdi is always greater than or equal to Fs and, therefore, wrapping is avoided. Hence, the wrap-around problem is solved.

4.4. Wrap Sequencer and Key Rotation Algorithm The functionality of the Wrap Sequencer is to ensure that the output element, from port Q, has the smallest Key value (V, not F) among all elernents inside the sequencer. Note that the smaller the Key, the higher the priority. Like a generic sequencer, a Wrap Sequencer is made up of entities with comparator. However, the Comparator in a Wrap Sequencer not only does comparison, but also key rotation if necessary. A Rotation Indicator (RI) signal will be set when rotation is needed and Cornparators in all entities will invert the most significant bit of the key. This operation has the effect of rotating all elements' key forward by half circle as described in the algorithm. Another component in the DFQ scheduler called Gate Controller, which blocks elements in the Inaccessible Region, wiil also receive the RI and will perform rotation on the originally unacceptable elements. If they fali in the new Accessible Region, they wiü go through the Gate Controller and enter the Wrap Sequencer. Therefore, dl elements in a Wrap Sequencer are in the Accessible Region. In the remaining section of this chapter, Fnewis assumed to be in the Accessible Region and we focus on how the rotation and comparison mechanisms are irnplemented in the Wrap Sequencer. The assumption wïü be released in the next chapter, which will also show that the entue algorithm described above can be implemented with a simple DFQ Scheduler embedded with Wrap Sequencer and Gate Controller.

4.4.1. Wrap Sequencer Design Each 33-bit eIement inside the sequencer requires a 33-bit storage space. When one sequencer cannot provide enough space, Wrap Sequencers can be cascaded to - - - devices such as RAM or FIFO. The next chapter will describe an example of cascading Wrap Sequencer and FIFO in a DFQ Scheduler. Fig.4.4a shows the sequencer interface signals. Elements are transferred serially through element T/O ports D, CD, Q and CQ. The blocking signals determine which input port can transfer elements in and which output port can transfer elements out. Input port (source) blocking signals DB and CDB are generated by the current sequencer while output port (sink) blocking signals QB and CQB are generated by a neighboring sequencer or by an external devices.

Fig.4.4. Wrap Sequencer interface signals and cascaded sequencers.

(Cascade Element Outpt) CQ D (Eiement Input) (Cascade Element Input) CD Q (Eiement Output)

(CQ ~~ocked)CQB Wrap Sequencer -+ DB (D Blocked) (CDB Blocked) CDB e--- QB (QBlocked)

(Chcade Seq. Full) CSF -b FU (Full) (Cascade Seq. Empty) CSE -+EM IEmpty)

4-' 4-' HB (Head Q Blocked) +- +- EB (End CQ Blocked) RIO (Rotaiion lndicata Out) RI (Rotation Indicator) Rese t +-- SC0 (Sute Count bit O) Cloc k 3 +------SC 1 (State Count bit 1) a. Wmp Spencer interface signals

f * CQ D Y CQ D* = + CD Q - @CD Q b CDB DB CDB DB St0mge ., q -1 i User v% Device " CQB QB * CQB Qn + Device + CSF FU L + CSF FU. 8 1 I ' CSE EM + - CSE EM @

HB,EB + RIO RI0 RE 4 RE L --+ -+CLOCK SC1 .Sa 4CLOCK SC1 ,SC0

b. Cascaded seauencers -1- -au Y- W Yl-uUUwYY su YAAyAuvY YUCU LAUIV YLLIwAuAnYJ .4 TV LU W LUUUUULWU later. In brief, since blocking signals affect only neighboring entities, when an element leaves the sequencer, it takes tirne for such information to propagate to other entities. With HB and EB, such information wili be immediately available to ali entities and the emptied space cm be re-used immediately. Section 4.6 will illustrated this idea further. RI indicates whether or not to rotate the virtual tirne. Rotation occurs when Fs (before rotation) is in the second half of the virtual clock where the most significant bit (MSB) is "1". Therefore, RI should be the MSB of the Key of valid elements transferred out of a sequencer through port Q. This information is stored intemally and communicated to external devices using RIO. RI does just the opposite. It allows external devices to communkate the rotation information to the sequencer. When Wrap Sequencers are cascaded as in Fig.4.4b, RIO of the head sequencer will connect to RI of al1 Wrap Sequencers and RIO of other sequencers are ignored. The State Count bits are for sequencer synchronization, and the meaning of full and empty bits are obvious in a single sequencer configuration. In cascaded the case, the head sequencer is full (FU) or empty (EM) only if the sequencer itself and di cascaded devices are full (CSF) or empty (CSE), respectively.

4.5. Wrap Sequencer Synchronization The Wrap Sequencer is operated in synchronous mode. Element transfer is synchronized using state count bits SC1 and SCO. Each element is processed in 33 clock cycles (one element-clock) which can be divided into four states TO, Tl, T2 and T3 as shown in Fig.4.1. The meaning of each state is shown in Table 4.1.

1 Table 4.1. Sequencer states in one element-clwk period I State SC1 SC0 Clock Period Wrap Sequencer Operation TO O O CLKO Exarnining the valid bits of al1 elements. Tl O 1 CLK1 Comparing the first key bit of elemen ts. Handling key wrapping. T2 1 O CLK2 - CLK20 Compriring the remriining 19 key bits to dctermine their relative priority. T3 1 1 CLK21 - CLK32 Transfemng tiie remaining 12 data bits.

Synchronization signals are generated by external circuits. This implies that the length of T2 is not fmed. It can be any value from 1 to 31. In other words, the length of key field can range from 2 to 32 bits which is configured by external control circuits. This feature increases the flexibility of element format and adaptabiiity of the sequencer to element.

Fig.4.5. Wrap Sequencer architecture and entities Master Entity "7-7Slave Entity

In Master Entity QREG input: QREGJN CQ-REG input: CQREGJN QREG output: QREG-OUT CQ,REG output: CQREG-OUT Switch input: D,CD, QREG-OUT, CQREG-OUT Swicdi output: Q. CQ, QREG-IN, CQREG-IN

3. Master and Slave Entities architecture

EB - CQB - CQ 4- CD - CDB +,

b. Wnp Seauencer architecture with Master and Slave Entities

Slave Entity Master En ti ty

c. Master and Slave Entities interface signnls A Wrap Sequencer contains four different types of building components: a Fullness Detector which determines whether the sequencer is full or empty or neither; a RI Generator to generate the RIO signal, and Master and Slave Entities to sort elements according to the described algorithm. The sirnplified architecture view of the Wrap Sequencer in Fig.4.5b shows only the entities.

4.6.1. Wrap Sequencer Entities Master and Slave Entities together perforrn the core function of a Wrap Sequencer. They are cascaded alternatively. The frrst one king a Master Entity and the last one king a Slave Entity. In Fig.4.5b, a Master Entity is represented by a larger block Mi and a Slave Entity is represented by a smaller blocHi. As shown in Fig.4.5c, both entities have the same interface signals except that RI only appears in a Master Entity, but not in a Slave Entity. The reason king that a Slave Entity does not perform switching function and, therefore, need not be aware of key rotation, which is performed in the 4x4 Switch in a Master Entity (Fig.4.5a).

4.6.2. Source-Sink Switching Each entity consists of two 33-bit registers CQJWG and ®, two external sources D and CD, and two intemal sources QREG-OUT and CQREG-OUT which are the output of QREG and CQREG, respectively. It also contains two external sinks Q and CQ, and two intemal sas QREG-IN and CQREG-IN which are the input of ® and CQREG, respectively. Therefore, each entity has four sources and four sinks in total. The frrst bit of a source, which is the valid bit, indicates to the entity whether the source has a valid element. If it is invalid, it will be treated as the lowest priority element. AU sources will be relayed to the appropriate sinks according to their priority, determined by valid and key. If an external source, D or CD, is blocked, the corresponding source will be set to NULL and becomes an invalid source. Master and Slave Entities handle source-sink switching differently. In a Slave Entity, unless the register is full, source D always goes to CQREG, then to CQ; source CD always goes to UEG,then to Q. If an external sink is blocked, the corresponding register will hold the element. There is no switching in a Slave Entity. A Master Entity does source-sink switching by a 4x4 Switch. The four inputs and outputs of the Switch are labeled DO, Dl. D2, D3 and QO, Q1, 42, 43, respectively. The four sources of the entity are the inputs of the Switch while the four outputs of the Switch ------respectively. Within the Switch, the four sources are sorted according to their priority. Defme XO, XI, X2 and X3 such that XO is the highest priority element and X3 is the lowest priority one arnong the four inputs of the Switch. i.e. XO = highest-priorïty(D, CD, CQREG-OUT, QREG-OUT) X1= second-highest-priority(D, CD, CQREG-OUT, QREG-OUT) and so on. When sink Q is not blocked, source-sink relationship is that (Q, QREG-m, CQmG-m, CQ) = (XO, XI, X2, X3), respectively. Otherwise, (Q, QREG-IN, CQREG-IN, CQ) = (NULL, XO, XI, X2), respectively. In this case, X3 equals to NULL and is ignored

4.6.3. Key Rotation When the Switch is performing source-sink switching, it has to compare the key of all valid sources bit by bit at Tl and T2. At TO, it checks if a source is valid. At Tl, it performs Key Rotation if needed (IU=l). At T2, it compares the rest of the key bits. Regardless of the value of RI, operations at TO, T2 and T3 are the sarne. At Tl, if rotation is needed, the Switch will invert the current bit of the inputs and compare their value. For an exarnple, let n=4, the key of two elements are E l=l4 and E2= 1, and RI=1. The two elements in binary representation are El = 11110xxx.. .* E2 = 1 O001 xxx . . . However, since RI=1, rotation is needed. The Switch will not compare El against E2 to determine their relativity priority. Instead, it WU invert the MSB of the key of El and E2 which becomes El' and ES' where El' = 1 O110 xxx.. . Fig.4.6. Key Rotation implementation E2' = 1 1001 xxx.. . and perform the cornparison based on the value of El' and E2'. Therefore, the Switch WUconclude that E2 > El. The implementation of such a rotation scheme is shown in Fig.4.6. 1

- Note that the fitbit is the "valid" bit and the second bit is the MSB of "key".

34 -.SV.-.. UVUL b\i Ya-naanG Each entity has to block extemal sources when there are not enough sinks for all sources. In a Master Entity, intemal sources cannot be blocked and intemal sinks are always available. The idea of blocking is simple and easy to irnplement. However, it is the most complicated part to explain the details of and is the most critical part to ensure proper ordering, due to the fact that it has to take into account the status of aii sources and sinks. In order to explain the detai., defme the data flow direction Çom CD to Q as forward direction, and that from D to CQ as backward direction.

4.6.4.1. Global Blocking Signals There are global and local factors affecting external source blocking. Global factors include the signal HB and EB. These two signals are avaiIable to every entity as shown in Fig.4.4b. When Q of head entity is not blocked (HB=O), the entire forward path will not be blocked. This is because if the element in the head of a path is leaving, every element in the sarne direction can be shifted forward simultaneously. Without this global signal, each entity is aware only of the situation of its neighbo~gentities. The fact that the head element is leaving takes tirne to propagate to other entities. Specifically, it takes one element-clock period to propagates through one Master and one Slave Entity. The global signal elhinates the propagation the and the information is acknowledged to all entities immediately and simultaneously. This improves the efficiency and utilization of the sequencer. For exarnple, when an element in Ml leaves through port Q and no new element enters Ml through port D, the element in QREG of SI wili shift forward and enter Ml through port CD. At the sarne tirne, M2 should be able to send one element to take up the space of %REG of S1. However, without WB the information that Ml is sending one element through port Q is available to S1 only. All other entities beyond S1 do not get the information and, therefore, do not shift elements forward. M2 is aware of the situation of S1, so M2 shifts forward one element after S 1 does so. M3 docs shifting after S2 does, and so on. With HB, the situation changes. The information about port Q of Ml is available to all entities sirnultaneously and al1 entities can shift elements forward simultaneously. The sarne operation applies to sources in the backward direction. That is, when CQ the of end entity is not blocked (EB=O), the entire backward path will not be blocked. mis-ordering may occur*. This particular case can be avoid when the sequencer is properly ~o~gured.Consider two cases: in case one, there is no other type of device attached to the end of the sequencer. In this case, EB and HB are connected. Therefore, EB=HB. In case two, an external device is attached to the end of the sequencer. Deno te XI3 (extemal Block) means that the external device is not ready to accept elements. Then EB=AND(XB, HB). In both cases, (HB,EB)=(O, 1) is avoided.

4.6.4.2. LodBlocking Signals Local factors w3i determine the value of blocking signals DB and CDB (see the functions in Table 4.2) only when the head of a path is blocked (HB=l ancilor EB=l). These factors include availability of extemal sinks Q and CQ and the status (full or empty) of interna1 registers CQREG and QREG which is indicated by QRF and CQRF, respectively. If QB or CQB is set, it means the corresponding sink is unavailable. In a Slave Entity, a source is blocked when the register ahead of it is full. Therefore, when CQJEG is full D is blocked, and when QREG is full CD is blocked. In a Master Entity, the blocking condition is more complicated and is listed in Table 4.2. In case 1, 2, 3, 5, 6 and 8, there are enough sinks to accept data fiom both external sources. Therefore, it is not needed to block any sources. In case 7, the entity can only accept one incoming data. Blocking either path will maintain correct order. CQ is not bloclced implies that there are storage spaces ('holes") behind the current Master Entity. If D is blocked and CD is not, one more "hole" will be created in the entities behind the current Master Entity. This wiil reduce the utilization of the sequencer. Our

. . 3g.4.7. Illustration of a ~ast&%&t~in case 4

Both D and CD should be blocked to avoid disordering. Le. (CDB,DB)=(l,l) Therefore, (HB,EB)=(O, 1) should be avoided to keep CDB= 1. Case (CDB,DB)=(l ,O): Master Entity If priority of elements from D is higher Uian that of the two elements in the registers but lower than that of the eiement from CD, it will go to Q. Therefore, it is in front of the dement frorn CD which should not be, i.e. disordering occurs. Case (CDB,DB)=(O,l) For the same reason, if an element from CD has a lower Q priority ihan an element from D and goes io Q, disordering occuis.

' (HB,ED)=(O,l) will enforce (DB,CDB)=(l,O) according to the function in Table.4.2. When

36 Case 1 Condition of Sinks 1 Blocking 1 Note Signals QB CQB QFU? CQRF DBL CDBL 1 O0 xx O O None of the data paths are blocked. 2 01 Ox O O Q-REG is empty and QREG-OUT is not a valid source. Therefore, there are three sinks and incoming data paths are not blocked. 3 O1 x0 O O CQ-REG is empty and CQREG-OUT is not a valid source. Therefore, there are three sinks and incoming data paths are not blocked. 4 O1 11 11 Both CQ-REG and Q-REG are full. Both D and CD are blocked. Note that blocking either one will cause mis- ordering. Therefore, both ways are blocked for entity to restore to case 1.2 or 3. O O Same as case 2. O O Same as case 3. O 1 CD is blocked in order to utilize the sequencer more efficiently. Blocking either path will result correct

O O Boih registers are empty. Therefore, there are two sinks and both incoming da& paths are not blocked. x x This case wiii not happen. If there is only one element in Master Entity, it will always in Q-REG. 11 Wait for the Master Entity to restore to case 1,2,3,6 or 8. The reason is same as that in case 4. 11 11 11 11 Al1 registers are full. No extra sinks. Block al1 1 1 incoming &ta paths. Note: Functions: QRF -- Q-REG is full. DBL = CQB*QlZF*CQRF + QB*CQB*QRF CQRF -- CQ-REG is full. CDBL = DBL + QB*QRF*CQRF DBL -- DB due to local factors DB = EB * DBL CDBL -- CDB due to local factors CDB = HB * CDBL (CDB,DB)=(O,l) should never occur. Tiierefore, (HB,EB)=(O,l) must be avoided

objective is to allow new accessible element to get into the sequencer whenever there is a "hole" there. Therefore, CD is blocked and D is not. The situation of case 4 is illustrated in Fig.4.7. In this case, the Master Entity has enough sinks to accept one external source. However, assigning the available sink to either D or CD wiU result in disordering. This is because if the Master Entity consecutively stays in case 4 for more than two element-clock periods, D and CD will not have a chance to compare with each other. To avoid disordering, the Master Entity blocks both extemal sources and waits for it to restore to either case 1, 2 or 3. The

(DB,CDB)=(l,O) occurs, mis-ordering will be possible. See the explmation for case 4 in TabIe 4.2.

37 - either case 1,2,3,6, or 8. Note that if (HB,EB)=(O,l), CDB* in case 4 cannot be set to 1 since HB=O. This is exactly the reason for putting this constraint on the global blocking signa15 that (HB,EB)=(O,l) should be avoided.

4.7. Sequencing Elements This section will use examples to illustrate how the sequencer sequences elements. After that, we will prove that the output element from Q of the sequencer is always the highest priority element among all the elements in the sequencer. Note that the smaller the key value the higher the priority.

4.7.1. An Example In Our example, we are using a single Wrap Sequencer and set HB=QB and EB=HB*CDB. The procedure of sorting is shown in Table 4.3. During element-clock 5, Ml is comparing 2, 5 and 15. It ends up that 15 goes to SI. During element-clock 6, QB=O and Ml is comparing 7, 2, 5 and 3. The highest priority one, 2, goes to Q and the lowest priority one, 7, goes to SI. The other two, 3 and 5, stays in the register. During this period, 2 is the sequencer output. As shown in the table, input elements are not in priority order while output elements are. This is exactly what we would expect Erom the Wrap Sequencer. This example illustration one particular case. We will now prove that the Wrap Sequencer always orders elernents according to their priority (key value).

4.7.2. Sequencing Elements in General The main feature of the Wrap Sequencer will be stated and proved in the following proposition

Proposition: The output element from Q of the Wrap Sequencer has the highest priority among al1 elements in the sequencer.

'The equation of CDB is shown in Table 4.2. If the highest priority eIement in a sequencer is either in Ml registers or in QRJX of S1 at TO, the Switch in Ml wiIl ensure that next output element is the highest priority one. In the following paragraph, we will show that this condition is

Number of key bits: 4 Wrap Sequencer bas 2 Master Entities and 2 Slave Entities Upper column of entities represent the key of the element in CQ-REG, lower column is bat of QREG Input sequence D: 7,5,2,lO,lS,3,13,8,5'(21),2'(18) Output sequence Q: 2,3,5,7,8,10,13,15,2',5' Note that the table shows the numbers at the end of an element-clock period. For example, at the end of element-dock cycle 1, Q-FLEG of Ml holds 7.

Element Relevant Key of elements in sequencer No te -clock Signals at the end of T3 Consider two groups of entities, Ml, SI, M2 and S2. Denote Ta be the beginning of i-th element-clock period and Eji be the j-th highest priority element at i-th element clock-period. First, consider Ml. The four sources are Dml, CQ-REGs1. Q-REGml and CQIREGml and the four sinks are al,CQREGI, Q-REGml and CQmREGml. At TOI, if the two highest priority elements are among the four sources, the highest priority

element El 1 will always go to Q of Ml and the second highest priority element E21 will go to Q-REGml as shown in Table 4.4. Therefore, at the beginning of the next element- clock period Th, the new highest priority element El2 to remain in the sequencer will be Q,REC&,l which is one source of Ml. Where is the second highest priority element E22? If it is also one of the sources available to Ml at Ta, the procedure will enter a recursive loop until there is no element in the sequencer. Let us find the location of E22. Now, consider the Master Entity M2. At TOI, can be Dm1, CQ-REGsl, Q-REGII or QREGm2. It will not be CaREGm2or those of S2 because they al1 have lower priority than Q-REGmZ. If E31 is either Dmlor Q-REGsl, E22 will be CQ-EGml at Th, Otherwise. E22 will be Q-REG1 at TOz. In both cases, E22 is a source available to Ml at T02. Therefore, we have shown that Ell and E21 are available to Ml at TOI, and E21

Table 4.4. The general sequence ordering of Wrap Sequencer

1 Locate the highest and second highest priority element El and E21,respectively, during 1 the FIRST element-clock period. Toi CQ-=Ga Q- REGs2 If El 1 and E21 are among these four shaded sources of Ml, Qmlwill becomes El 1 and Q-REG, becomes E21. T02 CQ-REGa CQ-REGm2 CQ-REGS1 CQ-EGrn i Dmi E2I=El2 i 1 Q-RJX~Q Q-REGm2 Q-REG1 El

1 Loçate the highest and second highest priority element El 2 and E22,respectively, during 1 -- - tme that Eni and En2 will be available to Ml at TO,. In other word, the two highest priority elements are always available to Ml and, therefore, the next element from Q of Ml is always the one with highest priority. From the above analysis, we observe that the following statements are bue: the highest priority element always goes to Q of Ml if it is not blocked. priority of QREGmi is higher than or equal to that of CCREGmi. prionty of elements inMi is higher than or equal to those in M(i+l). a priority of ®, is always higher than or equaî to that of Q_REG4i+i). a relative priority ofCQ_REG,i and those in M(i+l) is undeterministic. 0 relative prionty of C®si and those in CQREGdi+I)is undeterministic.

4.8. Size of Sequencer If the sequencer size is N,, it cm hold N,, elements, half of them in Master Entities and the other half in Slave Entities. Ignore those in the Slave Entities. It is easy to observe that those in Master Entities are in descending priority order. Further more, their priorities are aîways higher than or equal to that of the sequencer's CQ element. It is likely, but not always, that some elements in Slave Entities have higher priority than the lowest priority elements in Master Entities. Therefore, among N,, elements, at least N42 elements have priority higher than or equal to that of sequencer's end element, which is the element at port CQ of the sequencer. Consequently, if there are more than N,, elements passing through the sequencer and the excess ones go through the sequencer's CQ, the conclusion is that at least the first N', highest priority elements will be held in the sequencer, where N, 2 N', 2 N42. On average, N',, should be in the neighborhood of 3N,d4, which are the N42 elements in Master Entities and half of elements in Slave Entities.

4.9. Summary of Features of Wrap Sequencer A device is required for DFQ scheduling where each element's priority are reflected in a key value. The major problem being that the key is an increasing value and wrapping wili occur. This problem is solved by avoiding wrapping using a key rotation algorithm which is implemented as a sequencer. Due to the ability of handling wrapping, this device is called Wrap Sequencer. In summary, the Wrap Sequencer has following features: 1. It sequences eIements according to their priority. 2. It take key wrapping into account when determining elements priority. Y Y 4. It is a synchronous device. 5. Sequencers can be cascaded. 6. Input and output transfer are using different ports and cm be perfomed simultaneously. 7. It uses serial I/0 to reduce the number of pins. Current architecture requires about 25 pins and can be packaged in a dual-inline chip. 8. It takes 33-clock periods to process one element I/O. 9. The only pnority measurement is thekey value. Arriva1 time is not considered. In the previous chapter, we discussed the wrap around problem, proposed a solution which virtually avoids wrapping by using virtual time rotation, and designed a Wrap Sequencer to irnplement the solution. However, we made an assumption that Fn,, is always in the Accessible Region. In this chapter, we will release the assumption and use the Wrap Sequencer to construct a DFQ Scheduler. The basic functionality of this scheduler is not much different from that of a Wrap Sequencer. However, it provides two additional features. First, it does not require new elements to be in the Accessible Region. Among al1 new elements arriving at the scheduler, it will send those in the Accessible Region to the sequencer and keep the others in a separate queue. Secondly, it increases the element storage space by using FFOs which is much less expensive than Wrap Sequencers. In addition, the scheduler has standard If0 interface circuit to cornrnunicate with other system components.

5.1. DFQ Scheduler for WATM MAC Scheduling Fig.5.1 illustrates a DFQ scheduler in a WATM MAC board for medium access scheduling. An ATM ce11 which arrives at the MAC board will be stored in the buffer in address A. The information in the ce11 header will be used in the Virtual Time Calculator for calculating the virtual finish time F of that ce11 according to equation (2-2). The paired value (F, A) becomes an input element of the DFQ Scheduler which sorts al1 elements in ascending order according to F. In all instances, the output element of the Scheduler is the element that has the smallest F. The associated A of the output element is used by the MAC PDU Composer to retrieve the corresponding ce11 from Buffer for transmission. The scheduler described above is actually a priority queue manager. In software ternis, it is a link list. Each element of the list consists of the key F and the information A. The scheduler is implemented in hardware instead of software because of efficiency. To maintain a software link list, each new element has to compare with other elements in the list until the correct position is found. The series of comparison takes up Buffer 1 ATM CeUs

I 1

Gate Controiier: Elements in Accessible Region --> Wrap Sequencer EIements in Inaccessible Region --> Receive mFO Recycle Controller: New Element has pnonty to enter Wrap Sequencer Wrap Sequencer: Captures high priority elements when they goes through FIFO: Both mFOs are dynamically recycling Recycle FIFû: Stores low priority elements in Accessible Region Receive FIFO: Stores elements in Inaccessible Region Limitation: Larger FIFO requires faster recycling rate

CPU time. A hardware scheduler frees up CPU tirne. For each cell, there is only one element writing and one element reading when the cell arrives and departs, respectively.

5.2. DFQ Scheduler Architecture and Components The DFQ Scheduler in Fig.5.1 consists'of a Receive FIFO, a Recycle FIFO, a Gate Controller, a Recycle Controller, a Wrap Sequencer, a Sequencer-FIFO Interface circuit, and two VO interface circuits. There are two levels of control in the scheduler. First, the Gate Controller keeps Ftd in the Accessible Region and keeps the inaccessible elements in the Receive FIFO. Secondly, the Wrap Sequencer and the Recycle FIFO schedule al1 accessible elements, taking into account virtual time wrapping. scheduler and a 32-bit data bus. They handle the read/write handshaking, and parallel/serial conversion. Al1 elements inside the scheduler are transferred serially. The VO interface circuits will not be discussed further. The Wrap Sequencer has been discussed in Chapter 4. The other components will be discussed now.

5.2.1. Receive FIFO and Gate Controller The Gate Controller, as shown in Fig.5.2, determines whether a new element should enter into the Wrap Sequencer or the Receive FIFO. Only one element can enter the controller through one of the two input ports, DO and Dl, at a time. The selection is make by information available to Dl. Whenever there is a new element from Dl, it will be accepted. Otherwise, the elements in the Receive FIFO, from DO, will be. Whether an element can gain the access to the Wrap Sequencer is controlled by Fs which is feedback from the sequencer. If there is no new Fs from the sequencer, the same old Fs will be used until a new one is available. Recall that Fs is the boundary of Accessible and Inaccessible Regions and determines whether virtual time rotation is required. The comparator will compare the key of the accepted element and Fs to check if it is in the Accessible Region. If it is, it will go through output port Q1 and enter the sequencer. Otherwise, it will be stored in the Receive FIFO through output port QO, and we Say that the element is blocked. An element will be blocked under two conditions. Fiist, blocking occurs when the element is in the Inaccessible Region. Second, blocking occurs when both the Recycle FIFO and the Wrap Sequencer are full, i.e. when Q1 is blocked. Note that there is an element-clock penod delay in the Gate Controller. This is because the result of virtual time cornparison is unknown until the end of T2. Before

7ig.5.2. Gate Controller Architecture

element to sequencer

FU (from sequencer) data bus Fs (from sequencer) - * to buffer it in the register REG. In the next element-clock period, the result is known and the element can be sent to the appropriate port.

5.2.2. Recycle FIFO, Recycle Controller and Wrap Sequencer The Recycle Controller is a 2-to-1 multiplexer. The input is either the new element from Gate Controller or the element from Recycle FIFO when there is no new element coming. The output element then enters the sequencer. Therefore, a new element always has the right to access 1 Fig.5.3. Recycle Controiier architecture the Wrap Sequencer. Elements passing through the Recycle Controiier element from 3 Wrap Sequencer will be sorted in Gate Controiier ascending order according to their :lement from FIFO Blocking signal key value, which is virtual finish time for FIFO F in this case. When the Wrap Sequencer is full, the elements with large (not necessarily largest) virtual finish time will be pushed into the Recycle FIFO. Those elements in the Recycle FlFO can always re-enter the Wrap Sequencer when no new element is accessing it. Therefore, when the Recycle FIFO is not empty, some elements will dynamically go through the ring of Recycle F'IFO and Wrap Sequencer. During the cycling process, the high priority ones will stay in the sequencer while the low priority ones go through the cycle repeatedly. If a large sequencer is available, it is always preferable to use a large sequencer rather than the Recycle FIFO. In such a case, the scheduler can get rid of the Recycle Controller, Recycle FIFO and Sequencer-FIFO Interface. The architecture shown in Fig.S.1, which uses Recycle FIFO, assumes the worst case where large sequencer is not available. The Wrap Sequencer design, in Appendix A, needs about 40 flip flops for each element. If the size of Wrap Sequencer is Nseq,it needs approximately 40Nseqflip flops. The Xillinx 4000-family FPGA has about 22,000 flip flops and 900 configurabie logic blocks (CLB). Suppose that the design is implemented in these FPGA chips, the size of a Wrap Sequencer will be about 50. It may be less if each element uses more than f 6 CLBs or when the chip utilization is less than 100%. This is a very small number relative to the desirable size which is in the order of one thousand. Without using FIFO, approximately 20 sequencers are needed, which will take up too much space. - As the FlFO and the sequencer are operated differently, an interface circuit is required. The circuit provides three major functions. First, it rnakes sure that (HB,EB)=(O, 1) does not occur as required by the Wrap Sequencer. Secondly, it determines whether the element exits from port CQ of the sequencer should go back to the sequencer through port CD or should go to the FTFû. Third, it does signal adaptation (from seq.) DB,HB DB (to seq.) between the FIFO and the PIF0 FU) sequencer.

5.3. Scheduler Parameters and Performance In constructing the scheduler, there are six parameter vdues to be determineci. They are the size and operating speed of the Wrap Sequencer, those of the Receive FE0 and the Recycle FIFO. Since FlFO is an out-of-shelf commercial product and is not expensive, the two mFOs can be as large as desirable. Currently, fast WOcan operate up to ?? MHz which is much higher than the speed of a Wrap Sequencer. Therefore, the size and operating speed of the scheduler are not limited by FIFO, but by the Wrap Sequencer. If a Wrap Sequencer is implemented in FPGA, not only the size is very limited, but also the operating speed. Consequendy, it has small a size-speed product which is the product of the size and operating speed of the Wrap Sequencer. As will be shown in section 5.3.1, a small, fast Wrap Sequencer is as good as a large, slow one if their size- speed products are the same. A better size-speed product can be obtained by implementing the Wrap Sequencer using ASIC rather than FPGA. However, the cost will be higher.

5.3.1. Parameters of Recycle FIFO and Wrap Sequencer With a proper choice of scheduler parameters, al1 elements in the Scheduler will be served in proper order according to their relative priority. Alternatively, this claim can be forrnulated as min( virtual finish time of elements in Scheduler ) >= Fs. (5- 1) However, given that the size and operating speed of a Wrap Sequencer are lirnited, how large can the FIFOs be and the claim still be guaranteed? ut N= ana N, De tne size or Keceive rwu and that of KeCYCle PWU, respectively; Nmo be the sum of N, and NE; Nsg be the size of the Wrap Sequencer; Rschbe the element transfer rate (in elements per second) of the Scheduler; and Rsq be the required eIement transfer rate of the Wrap Sequencer and must be large or equal to Rsch. First of dl, al1 elements inside a Wrap Sequencer will be served in correct order by the definition of a Wrap Sequencer and is proven in the Proposition in Chapter 4. Therefore, when the Wrap Sequencer is not full and the Recycle FIFû is empty, the claim is valid. When the Wrap Sequencer is full and there are elements in the Recycle FIFO, al1 elements in the Wrap Sequencer will be in order, but elements in the Recycle FIFO may not be. However, elernents in the Recycle FFFO can always re-enter the Wrap Sequencer. When al1 elements in the Recycle FlFO cycle through the Wrap Sequencer once, at least, the Ns,/2 elements with the smallest virtual time will be picked up by the Wrap Sequencer. For every recycle period, the sequencer will have Ns42elements to be served in proper order. Therefore, if the recycle period is shorter than the service time of Nsed2 ordered elements, al1 elernents will be served in correct order and the claim wiil be valid. To ensure that the recycle period is shorter than the transmission time, the size of the two FIFOs has to be lirnited. The time to transmit Nsq/2 elements will be Nseq/2Rsch. The worst case for the recycle period is when both FIFOs are full and the Accessible Region is in its maximum size due to rotation. This occurs when Fs (before rotation) equals to T-'.Recall that the Recycle Controller gives the right of accessing the sequencer to new elements. As rotation occurs, al1 elements in the Receive FIFO will access the sequencer while those in the Recycle FlFO are blocked. If both FIFOs are full when rotation occurs, there is a total of Nmo unsorted elements to pass through the sequencer. Therefore, the longest recycle period is Nmo/Rseq. To ensure that al1 elements are served in correct order, we need

NFIFO1 Rseq < Nseq1 2Rsch Rseq > (NFIFO1 Nsq ) * 2Rsch- (5-2a) and if NFIFO1 Nseq < 112, Rseq = Rsch. (5-2b) If it takes Nc clock period to process one element, the minimum clock rate of the Wrap Sequencer will be %k = Nc * Rsq. (5-3) For example, let Rsch= 50 KHz, Nc = 33, Nsq = 50, Nn = 200 and N, = 300. Therefore, Nmo = 500 and, according to equation (5-2) and (5-3), Rs, > 1 MHz and Rcr > 33 MHz. Y * I A case, R'seq > (N, / Nsq ) * %ch if Nrx / Nseq > in; (5-4a) R'seq = RSC~, if N, / NseqI ln; (5-4b) the minimum clock rate required is reduced to R'ek = Nc * RIul (5-5) which is much less than Rck. Using the same example with a large sequencer: if Nsq = 400, R& will be 1.65 MHz. From the above equation (5-2a), NFIFO< Rseq * Nseq 12Rsch (5-6) where Rseq*Nseqis the size-speed product that bounds the size of FIFOs. Norrnally, there will be more elements in the Accessible Region than in the Inaccessible Region. Intuitively, it is because the Accessible Region is always larger. Taking another point of view, suppose that each connection maintains its own queue, the front elements will be in the Accessible Region. For a stable connection, queue length is finite and most of the elements will be in the Accessible Region. Now, these queues are combined into a global queue and the fact that most of the elements are in the Accessible Region is still true. However, when the result shows that this is not the case, then the virtual time space should increase. There are two ways of achieving that, either by increasing the number of virtual time bits or by cascading schedulers, as will be discussed in Section 5.4. To avoid accessible elernents being held in the Receive FLFO, it is necessary to ensure that the Recycle FIFO is not full. Otherwise, those accessible elements may not be scheduled properly if they have higher pnority than some elements in the Wrap Sequencer.

5.4. Cascade Schedulers There are two reasons to cascade schedulers. The main reason is to increase the virtual time space. The second reason is to increase element storage space. Virtual time space is lirnited by the number of virtual time bits. Given that virtual time is a n-bit value, the virtual time will span the space of 2". A cascaded two-stage scheduler, as shown in Fig.5.5, increases the space by 2"'l. Cascading one additional stage has the same effect as increasing the virtual time field by half bit. However, al1 of the increased space are in the Accessible Region.. The trade off is that more hardware is needed and an addition of one element-clock delay, which occurs in the Gate Controller. Cascade scheduler to: Increase virtual finish time space (each additional stage increase the space by 2"") Increase size of scheduler Tradeoff Delay increased (one element-clock delay in each Gate Controller)

Scheduler 2 Scheüuler 1 OReceive RF0

Recycle FiFO

IlWrap Sequemer

Scheduler 1: Accessible Region: (Fl,, ..., 2"-1) Inaccessible Region: (O, ...,FI,- 1) Scheduler 2: Accessible Region: m,..., 2""-1) Inaccessible Region: (2n-1,... P2,- 1) Scheduler 1 + Scheduler 2 Accessible Region: (Fl,, ..., 2"+2""- 1) Inaccessible Reg ion: (2n+2n-'-~2,-1)

In Fig.5.5, each stage of the scheduler has it's own Wrap Sequencer. New elements enter into the Scheduler 2. After one element-clock delay, accessible elements will enter its Wrap Sequencer. If this element is an accessible element to Scheduler 1 and Scheduler 1 is not full, the element will enter Scheduler 1. After another element-clock delay in the Gate Controller of Scheduler 1, the element enters the Wrap Sequencer of Scheduler 1. There are a total of two element-clock delay in the whole scheduler path. Since each scheduler has its own Gate Controller, they are operated in different and non-overlapping virtual time spaces. The virtual time space of the whole scheduler is the joint space of the two individual spaces as shown in Fig.5.5. Each stage of a scheduler increase the storage space by the sum of Nmo and N,,. There is no Receive FIFO in Scheduler 1, but the size of the Recycle FTFû can be as large as Nmo of Scheduler 2. However, increasing storage space should not be viewed as the objective, but as a nice by-product, of cascading schedulers. If increasing storage space is LIIG JU~GUUJG~U VG, a UGLLG~ way UI aullGvlllg ulaC 13 CU IIIUG~BC; LIIF JILG u1 nCwxvC: ruu, Recycle FIFO and Warp Sequencer proportionally.

5.5. Sumrnary The DFQ Scheduler is a hardware implementation of link list. The high prionty elements are held on the embedded Wrap Sequencer while other elements are stored in FIFOs and going through the sequencer repeatedly until their priority is high enough for them to be held in the sequencer. With appropriate configuration, this algonthm will ensure the output element of the scheduler is the highest priority one among dl elements in the scheduler. The most important number that affects the configuration is the size-speed product of a Wrap Sequencer. A large size-speed product allows more storage space in the scheduler and is desirable. Finally, it has been shown that cascading schedulers increase both virtual time space and element storage while delay is increases. Wireless ATM Bridge

The first stage WATM LAN will consist of two nodes only, one base station and one remote station. It will be an isolated LAN without interconnection with any other LAN or backbone. As far as hardware design and implementation is concerned, the base station and remote station are almost identical and, therefore, only one design is needed. By the end of this stage, the two nodes will be able to cornrnunicate. This will mark the end of hardware development. At the next stage, more remote stations will be added into the isolated LAN. The major work at this stage will be developing software/firmware drivers to activate the LAN and coding for implementing MAC functions. Consequently, a lot of things can be done. Further development will connect the base station to wireline networks. This chapter considers issues for fist stage development.

6.1. System Model Two approaches to introduce WATM functionality to the end nodes/hosts (remote stations and base stations) have been considered. 1 cal1 them the "compact" model and the "decomposed" model. The "compact" Fig.6.l. WATM model is shown in Fig.6.1. irnplementation -- the "Compact" mode1 Each host is equipped with a Remote Siaiion Base Station WATM card performing al1 functions of AAL, ATM, MAC and W-PHY. The card directly interfaces with i .-.. 1 the host via standard Remote Station Base Station interface protocols such as PCI. This mode1 is "compact" because one card does dl. Remote Sîation i

. ATM i PHY m'

...... , Rernote Station ! Base Station I WATM Bridge WATM Bridge

I B J ...... UNI ...... UNI...... --....-...:

Fig.6.2. shows the "decomposed" model. The WATM card in the "compact" model is now being decomposed into two components, one standard ATM card in the host and one WATM bridge. The ATM card is comrnercially available while the WATM bridge will be developed as part of the project. The bridge can be further decomposed into two parts: the ATM interface card which extracts ATM cells from the incorning bit Stream, and the MAC board which schedules the cells and sends them through the radio link. Therefore, the bridge performs functions of PHY, ATM, MAC and W-PHY. The "decomposed" model is the favored one because of its advantages to be discussed in the following section.

6.2. Advantages of the c'Decomposed'' Model The goal of the WATM LAN is to provide a testbed for evaluating/studying multimedia services over wireless link and wireless/wireline interworking issues. Therefore, when considering the system model, we are looking for features vaiuable to research activities. The "decomposed" model is the favored one because it has those features including simplicity and reliability, portability, upgradability, and expandability. Simplicity and Reliability: The device will be designed and developed by students or researchers who have lirnited expertise and resources for development. Simplicity of the design reduces the development time and increases its reliability. The "decomposed" model uses a commercial ATM card in the host which simplifies the MAC board design in two aspects. First, it eliminates the developrnent of AAL functions which * - the host which elirninates problerns relating to interface. This approach does not only increase hardware reliability, but also the reliability of further developed application and drivers. This is mainly due to the modularity and loss coupling between different layer functions. Furthermore, the commercial ATM card comes with drivers and API libraries which is believed to be more reliable than if we developed them. Portability: The goal is to have one universal device for all. The "decomposed" mode1 is universal in the sense that it is independent of the host equipment. Regardless of whether the host is a PC or a workstation, as fa as it is equipped with an ATM card the same WATM bridge can be used. It is also universal in the sense that it is independent of the functionality of the host. Both base station and remote station can share the same bridge with slight modification or no modification at dl. Furtherrnore, the base station can be an end terminal, an ATM multiplex, or an ATM switch. Upgradability: As ATM is still an emerging technology in many aspects. The "decomposed" bndge wiU enable the wireless system to be in Pace with the new developments in ATM. For example, when newer ATM cards are available, the same bridge can be used. Furthermore, the bridge can interface with any vendor's ATM card as long as the UNI standard doesn't change. Expandability: Due to the "universal" feature of the MAC board, it will be easy to expand the size of the LAN by adding more remote stations, or creating more LANs by adding more base stations.

6.3. WATM Bridge Components Fig.6.3 shows the components of the WATM bndge. It consists of an ATM interface card and a MAC board.

6.3.1. ATM Interface Card The ATM interface card has two data I/O ports. The serial port cornrnunicates with the ATM card in the host via UTPS cable. The parallel port transfers ATM cells to/from the MAC board. Four major parts in the ATM adapter are identified as Connector, Transceiver, Network Termination Controller (IWC) and Address Translation Controler (ATC). Connector: This is a RJ-45 connector to provide a port for the UTPS cable whose maximum length is 100 m. recoverylgeneration, and serial/parallel conversion. The receiver receives serial data and demultiplexes single-channel serial data into eight-channe1 parallel data. The transrnitter does the reverse. It multiplexes eight-channel parallel data into single-channel serial data. Network Termination Controller: The device implements PHY-TC functions including ce11 delineation, scrambling, fiarning, and ce11 extraction/insertion. Tt includes a generic 8-bit parallel interface to an external transceiver which performs serial/parallel conversion. Another eight data bits connects directly to an Address Translation Conboller in order to provide real-time address translation. It may also maintain statistics for active virtual circuits including ce11 and error counts. A microprocessor interface allows the MAC board to get information from its interna1 buffer. Address Translation Controller: It provides a high speed translation function of the ATM ce11 header information in real time. It replaces ATM VPI and VCI and can append additional tags. This feature is useful for DFQ scheduling because the tag can be the virtual time increment of a connection. It means that the Virtual Time Calculator does not need to look up virtual time increment for every incoming cell. This relevant information cornes with the cell.

6.3.2. MAC Board As the WATM LAN will be used as a testbed for research and expenence, it is important that the system be easily adapted to modifications. Those modifications may range from operating frequency band, to encapsulation formats, to different MAC protocols. Therefore, the main cornponent of the MAC board is a general purpose evaluation board embedded with microprocessor to offer flexibility. Software codes that implement functions of the MAC and/or W-PHY are residing

Fig.6.3. WATM bridge structure and components ATM Interface - Monitor-

ATC: Address translation controiier Sch: Scheduler Conn: RJ-45 Connecter Tm:Transceiver NTC: Network termination controiier board controls data flow in the WATM bridge. It fetches the ATM ce11 fiom the ATM adapter, buffers the cell if necessary, determines the virtual finish time of a cell, schedules al1 buffered cells, composes MAC PDU and sends it to a digital radio transceiver. Except for scheduling and radio receivingltransrnitting, al1 other functions are irnplemented in software. Therefore, the board has another port for downloading the codes fiom a development platform or monitor. Scheduling will be perfomed by the DFQ scheduler discussed in the previous chapter. The high speed radio transceiver is still not available yet, but is being developed.

6.3.3. Part List We have found some commercial parts for building the WATM bridge. Table 6.1 list the set of the required parts.

Table 6.1. WATM bridge part list I I Function 1 Part No. Manufacturer Connector PE68532 Pulse Transceiver MB582A, MB583A Fujitsu NTC MB86683B Fujitsu ATC MB86689A Fujitsu MAC Board i960 Evaluation Board In tel Scheduler DFQ Scheduler University of Toronto Radio transceiver -- Universitv of Toronto Conclusion and Future Work

The two major items discussed in this thesis are athe DFQ scheduler and WATM bridge. More work is needed to make these two devices functional. To have a working WATM LAN, firmware coding is needed to implement the DFQ protocol and drivers are needed to operate the bridge.

7.1. DFQ Scheduler This thesis studies the wrap around problem associated with virtual time and solve the problem by rotating the virtual clock whenever necessary. Based on the virtual clock rotation algorithm, a wrap sequencer is built. It sorts elements according to their true relative priority even if wrapping occurs. The sequencer is used as a component for building a DFQ scheduler. The scheduler has a standard 40 interface using the three-way handshaking protocol and can be easily integrated with a general purpose evaluation board. The operation is transparent to al1 other system components and the number of virtual time bits are flexible and configurable. For a 25 Mbps physical data rate, the minimum operating clock rate is 1.65 MHz. However, the actual clock rate will be increased if the buffer size is larger than the sequencer size. This is the major shortage of the scheduler since large buffer is needed and available sequencers are small . The second shortage is that when schedulers are cascaded, there is one-element-clock delay for every additional cascaded scheduler. As far as scheduler design is concemed, no further work is necessary. However, implementation of wrap sequencer using FPGA is just for an initial trial. To have a better performance and more compact sequencer, it is desirable to fabricate the design in ASIC. When wrap sequencers are available, some effort is required to construct the scheduler by interconnecting al1 of the components. The sequencer has been simulated with Powerview tools including Viewdraw, Viewsim and Viewtrace for verifying its functionality. Irnplementing the design in FPGA has been done by Vineet in VLSI group. He used Altaria FPGA and found that each l.1 un ~11pbail LIII~KIIIGILL a vv lay o~yu~iib~iui ~ILG JU QIIU LILG IIIM~~IIULII~rubn raLc 13 20 MHz. This size-speed product is too small for a scheduler with NmO equals to 500. With current design, this number cannot be improved. There are two way to improve the performance. One is to implement the design in ASIC. The other is to modify the cment design. Currently the Wrap Sequencer design uses single bit serial data I/û. Each element-clock period is 33 operating clock cycles. Modify the design to have n-bit data I/O ports will reduce the element-clock period to 1+32/n operating clock cycles. This will improve the performance by an order of n times. For example, a 20 MHz 4-bit Wrap Sequencer wiI1 have the same performance as a 80 MHz single-bit sequencer. Therefore, there is pretty much room for improving the sequencer's performance.

7.2. WATM Bridge WATM bridge architecture is shown in Chapter 6 and some information of building parts available in the market are listed. The only necessary, but unavailable, component for building a WATM bridge is a 5 GHz high speed digital transceiver. It is being developed now at the University of Toronto. Both hardware and software works are required to have the WATM functional. The hardware work includes construction of the WATM interface card and MAC board. This part of work should not be difficult. As soon as resources are available, the process of building the WATM bridge can start imrnediately. A lot of software/firmware development is required before a WATM LAN can be set up. Coding is need for driving the bridge, implementing the DFQ protocol, and collecting statistics. This will take much longer than hardware works. Wrap Sequencer Design and Schematics

Document Organization This document describes the design and schematics of a Wrap Sequencer. It is organized in a way sirnilar to the hierarchical circuit design of a sequencer. Each sub- section under section 1 and section 2 describes a circuit. For example, sub-section A.2.1 describes a Fullness Detector circuit in Fig.A.2.1.

Mini-Contends and List of Figures of Appendix A A.O. Introduction A. 1. Wrap Sequencer (Fig.A.0, Fig.A. 1) A.2. Wrap Sequencer Cornponents (Fig.A.2) A.2.1. Fullness Detector (Fig.A.2.1) A.2.2. RI Generator (Fig.A.2.2) A.2.3. Master Entity (Fig.A.2.3) A.2.3.l. 33-bit Shift Register (Fig.A.2.3.1) A.2.3.2. Block Generator (Fig.A.2.3.2) A.2.3.3. Switch (Fig.A.2.3.3) A.2.3.3.l. Comparator (Fig.A.2.3.1) A.2.3.4. Selector (Fig.A.2.3.4) A.2.4. Slave Entity (Fig.A.2.4) A.3. Implementation Notes A.4. Circuit Design Tools A.V. Ulb. UUUbCIUII A sequencer is a sorting device that sorts elements, either in ascending or descending order, according to a key field of the elements. A sequencer element is a data unit which consists of three fields: the valid field, the key field and the data field. A wrap sequencer is a sequencer which is able to handle key wrapping. Wrapping happens when the key is a k-bit field and its value increases with time. Eventually, the key will reach its maximum value 2k-1.Further increase of key value will wrap it and its face value K' becomes K mod 2k,which is 2k less than its actual value K. A generic sequencer cannot be used to sort this kind of element because its soriing algorithm is based on the face value of the key and does not consider the actual value. This appendix describes the detail design and schematic of a specifïc Wrap Sequencer. Wrap around problem, Wrap Sequencer's algorithm and its application in a DFQ scheduler are discussed in Chapter 4 and Chapter 5. Chapter 4 also presents a wrap sequencer architecture which does not discuss detail implementations. Each sequencer provides Nsq element storage registers. Each eiement sent to the sequencer is 33 bits long as shown in Fig.A.0. It consists of a 1-bit valid field, a 20-bit key field and 12-bit data field (we choose k=20 for this case as an example). Internally, each element is stored in a 33-bit register whose most significant bit is a valid bit. It indicates whether the register stores a valid element or if it is empty. In this design, elements are transferred senally. Therefore, 33 clock cycles are required to process an element. IFig.A.0 Format of user and sequencer element.

32 31 32-k 31-k O Sequencer Element k-bit Key (32-k)-bit Daia 1

Element is 33 bits long. Vdid bit indicates whether sequencer storage segister holds an user element or it is empty. Input: the most significant bit (MSB)enters sequencer first. Output: the MSB exits sequencer first. k can be any value ranging from 2 to 32.

The 33-clock-cycle element-dock can be divided into four States, namely TO, Tl, T2 and T3, represented by two bits SC 1 and SC0 (State Count bits). The tasks in each of the period is briefly described in Table A.1. They will be elaborated when the detail design is discussed in later sections.

Table 1. The Four State of Element-Clock SC1 SC0 Clock Period Wrap Sequencer Operation TO CLKO Exarninin~the valid bits of al1 elements. Tl CLKl Comparing the fmt key bit of elements. Handling key wrap around. n CLK2 - CLK2O Cornparhg the remaining 19 key bits. T3 CLK21-CLK32 Transfemng the remaining 12 data bits.

The Wrap Sequencer operates in syncluonous mode and is synchronized by the State Count bits, S 1 and SO, which are generated from an external device. This provides flexibility in deterrnining the length of the key field, which is equal to the period of Tl+T2. This is vahable due to different requirements of different applications. In this document, we choose k=20. However, it can be any value ranging fiom 2 to 32. cascaded sequencers and a State Counter which generates State Count bits for synchronization.

Table A.2 Wrap Sequencer Interface Signals Symbol Name and Description D 1-bit element input Input element is msferred into sequencer serially in 33 clock cycles. Q 1-bit element output Output elements are aansferred fiom sequencer serially in 33 clock cycles. DB D Block. When the sequencer cannot accept eIements, it set DB=l at the beginning of Tû. QB Q Block. When extemal devices are NOT ready to receive elements kom the sequencer, it shall set QB=l before TO. FU Sequencer Full, Sequencer Empty. EM These two signals indicate if the sequencer is full or empty. FU=1 when storage registers in al1 entiries (Section 4.6) in the sequencer are used and CSF=l. EM=1 when there is no elements in the sequencer and CSE=l. MO Rotation Indicator Output 1 1 WO=l tells other devices that FU should be set to 1. It is set to the most significant bit of the 1 key of Q. RI Rotation ùidicator. When RI=l, it indicates to the sequencer to rotate foward the key value by 2k-1, where k is the key field length, when key bits are compared for sorting (key rotation algorithm in Section 4.3). SC1 State Count. SC0 At TO, (SCl,SCO)=(O,O); at Tl, (SCl,SCO)=(O,l); at T2, (SCl,SCO)=(l,O); at T3,

cD,cQ These are for sequencer cascading. Same as D,Q,DB,QB. CDB,CQB The last cascaded sequencer should set CD=CQ and CQB=CDB. HB,EB Head Block (HB),End Block (EB) These two signals are global to dlentities of al1 cascaded sequencers. It indicates to the entities whether the head or end of a sequencer is blocked. (details in Section 4.6.4) Note that (HB,EB)=(O,l) should be avoided. CSF Cascaded Sequencer Full (CSF)l, Cascaded Sequencer Empty (CSE) CSE Indicates to the previous sequencer that the cascaded one is futl or empty. The last cascaded sequencer should set CSE=CSF=l.

A.2. Wrap Sequencer Components (Fig.A.2) The Wrap Sequencer consists of a series of cascaded Master and Slave Entities for sorting elements, a Fullness Detector for deterrnining whether this sequencer is full, and a RI Generator for generating the MOsignal. These Wrap Sequencer building components will be described in the following sub-sections. Fig. A. 1. Wrap Sequencers Sequencers in Operation

CLK A A - - 1

. Y The Fullness Detector counts the number of elements N in the sequencer to determine if the sequencer is full or empty. If CSE=l and Na, it is empty. If CSF=l and N=Nseq,it is full

A.2.la. Fullness Detector Circuit Description It is implement by an up/down counter and two decoders. When a new element enters a sequencer, the counter counts up. This is the case when DB=O and D=l (valid element) at TO, or when CDB=û and CD=1 at TO. When an elements Ieaves the sequencer, the counter counts down. This is the case when QB=û and Q-1 (valid element) at TO, or when CQB=O and CQ=l at TO. Elements that enter fiom D or that exits fiom Q are counted at TO, while elements that enter from CD or that exits from CQ are counted at Tl. Therefore, the information about CD and CQ at TO has to be store in fiip flop for a clock period. EMPTY is set when the cascaded sequencers are empty (CSE=l) and the sequencer has zero elements in it. NLL is set when the cascaded sequencers are full (CSF=l) and the sequencer counts 4 (assume we have Ns,=4 storage spaces in each sequencer). This value is just for the purpose of simulation. When it is built, it can be changed to match the number of storage spaces Nsev Fig.A.2.1. Fullness Detector

FD CD-IN D Q r ANDIDL

CLK le

car )I

tSequencer Full Detector - It generates the Rotation Indicator Output (RIO) signal which is the most significant bit (MSB) of the element being served. From the viewpoint of a Wrap Sequencer, it is the MSB of the key of the element which goes through the sequencer's port Q. Note that RI is useful only at TO and Tl of an element-clock period.

A.2.2a. RI Generator Description The circuit mainly consists of three D-type flip flops (FF). D is the elements that goes to port Q of a sequencer. At TO, when port Q is not blocked and the element is ready to go to port Q, the first (left) FF stores the valid bit of the element. This value will determine whether the second (middle) FF should store the MSB of key of the element at Tl. Whatever the value in the second FF is, it will be stored in the third (right) FF at T3. This is because the MSB of the current element's key wilI be used at TO and Tl of the next element clock, and updating the value at T3 ensures its availability at the next element-clock penod..

- - - The Master Entity consists of a Block Generator for blocking signal generation, two registers Q-REG and CQ-REG for storing two elements, five 2-port Switches for sorting four elements in order, and a 4-port Selector for source-sink matching. Each Master Entity has four sources @,CD,QREG-OUT and CQREG-OUT) and four sinks (Q,CQ,QREG-IN and CQREG-IN). The core task of an entity is to assign the sources to the sinks according to the key of source elements and the blocking condition of its neighbor Slave Entities. This is done by the combination of the five 2x2 Switches and a Selector, which is referred to as a 4x4 Switch. Master Entity also generates blocking signals for corresponding incoming paths. This is done by Block Generator. The five Switches ensure that key(X0) I key(X1) < key(X2) 5 key(X3), where key(X) means key of element Stream X. The Selector then assigns the highest priority source to the highest available priority sink. When sink Q is not blocked (QB=û), the source-sink relationship is that (Q, QREG-IN, CQREGJN, CQ) = (XO,XI, X2, X3), respectively. Otherwise (QB=l ), (Q, REG-IN, CQREG-IN, CQ) = (NULL, XO, XI, X2), respectively. Therefore, the highest priority element always goes to Q when it is not blocked and the lowest priority one goes to CQ. At the end of each elernent clock period, key(Q-REG) is always less than or equal to key(CQ-REG). When Q is blocked, Q-REG and CQ-REG always holds the two highest priority elements. Fig.A.2.3. Master Entity I a t .= I D

CD-VALID E-fv AND2

CLK

DLXXK OLN. Q CQRLO-OUT

Master Entity - . - Q-REG and CQ-REG in the entities are 33-bit Shvt Registers. Each register provides a element storage space.

A.2.3.1, 33-bit Shift Register Circuit Description Each such register contains 33 D-type flip flops, with Reset and Clock Enable, in series. Data shifts frorn D to Q seriauy. - Fig.A.2.3.1.33-bit ShiR Register B I C I D FDCR FDCR FDCR FDCR FDCR FDCR FDCR FDCR

QO 0% Pa 03 QI os Q6 a ) -0 O D Q D Q- P a D Q. D Q. Q. P Q- R R - R - .1 -CL -CL -CK -CL CL r -b= -. * * A -A w* , -A v* 1A -A -* w w w - 1

FDCR FDCR FDCR FDCR FDCR FDCR FDCR FDCR 016 QI7 01. 019 QI0 Qa 1 Q2 a P a D Q D Q D Q D Q D 0 D O D Q

FDCR

D Q ) Q

33-bit Shift Register . - The Block Generator determines which sources should be blocked. The blocking signals DB and CDB for incorning data paths D and CD, respectively, indicate to the neighbonng sources that data cannot be accepted. If DB and CDB are set, the neighboring sources should not send data to this Master Euttity because the element will be ignored. The Block Generator also determines which of the four sources are valid. This information is useful to the 4x4 Switch. The purpose of blocking incoming data paths is to rnake sure that dl non-blocked sources have a sink. Blocking signal generation are based on the blocking signals of outgoing ports and the storage status of the two registers. For example, if both outgoing paths are blocked and both registers are full, then it should block both incoming data paths. Table A.3. describes the conditions for blocking signal generation. In case 1,2,3,5,6and 8, there are enough sinks to accept data fiom both external sources. Therefore, it is not needed to block any sources. In case 7, the entity can only accept one incorning data. Blocking either path will maintain the correct order. CQ is not blocked implies that there are storage spaces ("holes") behind the current Master Entity. If D is blocked and CD is not, one more "hole" will be created in the entities behind the current Master Entity. This will reduce the utilization of the sequencer. Our objective is to allow new accessible element to get into the sequencer whenever there is a "hole" there. Therefore, CD is blocked and D is not. In case 4, Master Entity has enough sinks to accept one external source. However, assigning the available sink to either D or CD will result in disordenng. This is because if the Master Entity consecutively stays in case 4 for more than two element- clock periods, D and CD will not have a chance to compare with each other. To avoid disordering, the Master Entity blocks both external sources and waits for it to restore to either case 1, 2 or 3. The situation in case 10 is similar to case 4. However, the entity is waiting to be restored to either case 1,2,3,6, or 8.

Table A.3. Master Entity source blocking signais (when head of path is blocked)

Case 1 Condition of Sinks 1 Blocking 1 Note Sipals QB CQB QRF CQRF DBL CDBL 1 O0 xx O O None of the data paths-areblocked. 2 O1 Ox O O ® is empty and QREG-OUT is not a valid 1 1 1 1 source. ~herefore.there are three shks and incoming data paths are not blocked. 3 01 x0 O O CQREG is empty and CQREG-OUT is not a vaiid CD are blocked. Note that blocking either one will cause disordering. Therefore, both ways are blocked for the entity to restore to case 1,2 or 3. 5 1 O Ox O O Same as case 2. 6 10 x0 O O Same as case 3. 7 10 11 O 1 CD is blocked in order to utilize the sequencer more efficiently. Blocking either path will result in correct ordering. 8 11 O0 O O Both registers are empty. Therefore, there are two ( sinks and both incoming data paths are not blocked. 9 11 O1 x x This case will not happen. If there is oniy one element in Master Entity, it will always be in QREG. 10 11 IO 11 Wait for the Master Entity to restore to case 1,2,3,6 or 8. The reason is same as th& in case 4. 11 11 11 11 AI1 registers are full. No extra sinks. Block al1 incoming daîa paths. Note: Functions: QRF -- QREG is full. DB = EB * DBL CQRF -- CQ-REG is full. CDB = HB * CDBL DBL -- DB due to local factors CDBL -- CDB due to locai factors DBL = CQB*QRF*CQRF + QB*CQB*QRF CDBL = DBL + QB*QRF*CQRF (CDB,DB)=(O, 1) should never occur. Therefore, 1 (HB,EB)=(O,l) must be avoided

When Q of the head sequencer are not blocked, it means the head elernent is leaving the sequencer. In this case, CDB=û. When CQ of the end sequencer are not blocked, it means there are empty spaces for CQ. In this case, DB4. Note that if (HB,EB)=(O,l), CDB in case 4 cannot be set to 1 since HB4. This is exactly the reason for putting this constraint on the global blocking signals that (HB,EB)=(O, 1) should be avoided.

A.2.3.2a. Block Generator Circuit description The Block Generator implements the DB and CDB functions in Table A.3. QRF and CQRF are the first bit of QREG-OUT (QRO) and CQREG-OUT (CQRO), respectively. Recall that each register stores an element and a valid bit. The combination of a D-type flip flop (FDCR) and a 2-to-1 multiplexer (M2-1) is used repeatedly through out the sequencer component schematics and should be described in more details. In many cases, a value being evaluated at sorne state, for example at TO, this value has to remain constant for the whole element-clock. For example, in the Block Generator, the valid bit at TO determines whether a source is valid and this information should remain constant for the whole element clock to maintain consistency. Therefore, at TO, the evaluated value (such as CQRO) is used and stored. From TI to T3, the stored value will be used. The circuit directly implements the two blocking functions derived from Table A.3 at TO and the values are maintained for the whole element-clock period. The validity of D and CD are determined by DB and CDB, respectively. When a source is blocked, the Block Generator declares it as invalid. Therefore, when DB=l, set DV4; when CDB=l, set CDV=û. At TO, if the valid bit of a register (CQRO or QRO) is O, it indicates that it contains no valid element and the Block Generator declares it as invalid. Therefore, CQV=CQRO at TO and QRV=QRO at TO. am00 IOLE uo:

a m a uoo . - - The main function of a sequencer entity is to match sources to the right sinks. The five Switches and the Selector shall be viewed as one functional group for source-sink matching which is performed in two stages. First, the sources are sorted in ascending order as key(X0) 2 key(X1) 5 key(X2) 5 key(X3). This is done by the five Switches. Each such two-port Switch relays the higher priority (smaller key value) bit streams to Q1 and the lower priority one to QO. The Selector then assigns the sorted streams to the sinks according to the function described in Table A.3 in section 2.3.2.

A.2.3.3a. Switch Circuit Description A Switch consists of a Comparator for comparing the key of two elements, and two multiplexers for assigning the input elements to output ports according to the evaluation of the Comparator. The Comparator compares the key of DO and Dl to determine their relative priority. When key(D0) < key(Dl), LESS-1. LESS is then used as the select signal (SE) of the two 2-to-1 rnultiplexers. When LESS4, @=DO and Ql=Dl. Otherwise, QO=Dl and Ql=DO. Therefore, the element from Q1 always has higher priority than that fiom QO-

- The Comparator compares two element bit streams DO and Dl. If key(D0) < key(D l), LESS= 1. Otherwise, LESS=O. The first bit of each element streams is a valid bit and the next 20 bits are key bits. These 21 bits determine which strearn is LESS. The simple rules of comparison are listed bel0w : 1. A valid element always has a higher priority than an invalid one. That is key(va1id element) < key(invalid element). 2. If the more significant bit has determined the result, ignore the current bit comparison result. 3. By the beginning of T3, if key(DO)=key(Dl), set key(DO)>key@l). That is set LESS=O.

A.2.3.3.la. Comparator Circuit Description At TO (SCl=SCO=O), the Comparator is handling the valid bit. It will negate the valid bit before comparing. The reason being that a valid stream (valr'd bit is 1) always has higher priority than an invalid one (valid bit is 0) and a smaller bit value has higher priority . After the negation, a valid stream will be numerically less (higher priority) than an invalid one, and thus has higher priority. When both streams have the same validation, negation does not affect the result of comparison. RI is the Rotation Indication. When RI=l, the Comparator rotates the key value by 2'-' (see Section 4.3 for Key Rotation algorithm) before comparing. This rotation is implemented by negating the most significant bit of the key. Therefore, at Tl (SC14, SCO=l), if RI=1, XO=not DO and Xl=not Dl. Otherwise XO=DO and Xl=Dl. This operation is to ensure that elements with wrapped key will be ordered correctly. In the circuit, PLESS and PGREATER are the result of previous bit comparison. At TO, PLESS and PGREATER are meaningless and set to O. At the same time, the value of LESS depends on the current bit value of XO and XI. After that, LESS is a function of both the previous and the current bit values. When either PLESS or PGREATER is set, they will retain their values until the next TO. If XO and X1 has identical key value, PLESS and PGREATER will both be O at the end of T2. In this case, the Comparator will assume XO is larger and set PGREATER. Therefore, al1 T3 bits will be ignored. Comparator operation is summarized below: 1. TO: the current bit is the first (valid) bit of a new element. 2. Tl : the current bit is the first key bit. T2: the current bit is one of the key bits. T3: the current bit is not part of the key bits. For the first 2 1st bit time (from TO to T2), the valid bit and key bits are being compared bit by bit, more significant bits fist. At TO, the cornparison is not depended on the previous bit value. i.e. pLESS and pGREATER are ignored. In Tl and 7'2, the previous bit cornparison result, as well as the current bit value, determines the value of LESS.. When previous bits already determines the results, ignore the value of the curent bit. At the beginning of T3, if al1 previous bits are the same, then assume that XO is greater, and set pGREATER=l and LESS*. Fig.A.2.3.3.1.Comparator

PLLSS

SC1 NOT 10 SC0

PORUTLR

Comparator A.2.3.4. Selector (Fig.A.2.3.4) The Selector inputs DO, Dl, D2 and D3 are arranged in ascending order according to their key values. Output Q, QREG-IN, CQREG-IN and CQ are assigned one of the inputs. There is only one simple rule for the assignrnent. If QB=û, assign (Q, QREG-IN, CQREG-IN, CQ) = @O, Dl, D2, D3). Otherwise, assign (Q, QREG-IN, CQREG-IN, CQ) = (NULL,DO, DI, D2). Note that the Selector matches sources with sinks and the Block Generator deterrnines which sources are valid and which sinks are available.

A.2.3.4a. Selector Circuit Description. Depends on QB at TO, Q could be either NULL or DO, QREG (goes to QREG-IN) could be either DO or Dl, CQREG could be either Dl or D2, and CQ could be either D2 or D3. AU these three functions are implemented by a 2-to-1 multiplexer using QB as SE.

A Slave Entity is simpler than a Master Entiîy by eliminating the function of source-sink matching, and by simpiifying the block signal generation. For source-sink matching, D always goes CQ-REG if it is not full and CQ-REG goes to CQ if it is not blocked. A sirnilar operation applies to CD. CD always goes to Q-REG if it is not full and ® goes to Q if it is not blocked. If a sink is full or blocked, the corresponding source will be blocked.

A.2.4a. Slave Entity Circuit Description Slave Entity has two 33-Shrjct Registers Q-REG and CQ-REG. input Stream D always goes to CQ-REG, then CQ; CD always goes to Q-REG, then Q. When CQ=l at TO, it means CQ-REG is full. If at the same time EB=l, DB will be set. Note that when EB*, the backward path (D to CQ) will not be blocked. When DB=1, D is immediately set as a NULL source. CQ-REG is enabled only when either DB=O or CQB=O. That is when either source D and sink CQ is not blocked. A sirnilar operation applies to the forward path (CD to Q). DB is set when both EB and CQ are set, and CDB is set when both HB and Q are set. If EB=1 and HB=1, the value of DB and CDB depends on whether CQ-REG and Q-REG is full, respectively. This is not efficient because at least half of the time the two registers will be empty. A "better" design is to allow D to enter CQ-REG when the element in CQREG goes to CQ. This improves the effkiency. However, the trade off is that it limits the depth (size) of a sequencer. The "better" design can be implemented by setting DB=EB*CQ*CQB and CDB=HB*Q*QB. The depth of sequencer increases as more Wrap Sequencers are cascaded. If DB is a function of CQB, propagation delay will limit its depth. This is because DB of Slave Entity is a function of CQB, which is DB of its neighbor Master Entity; and DB of Master Entity is a function of CQB, which is DB of its neighbor Slave Entity. Therefore, when DB of head entity changes, it has to propagate to dl entities since it affects al1 entities. The deeper the path, the longer the propagation delay will be. Currently, the longest path for the blocking signal goes fiom a Slave Entity to its neighbor Master Entity. This propagation time is almost fixed regardless of the depth of the sequencer. latter. Fig.A.2.4. Slave Entity

Slave Entity '..U. ~...y.YI..Y..~UCIVI. 4 1VbW This circuit has been simulated by viewsim using the comrnand file -hungw/powerview/sequencer/sim/sch.cmd. However, this simulation only verïfies it functionality. When implemented in FPGA, delay may be the major concern. Concerning delay, the critical path that has the longest delay is in Master Entity. This is the data path which starts fiom the 2x2 Switch input to the Selector output. If delay in this path is unacceptable, it can be reduced by inserting D-type flip flops after each Switch. Doing so should reduce the length of the critical path by 2/3. However, synchronization has to be redesigned. When delay is a problem, the design can be modified, as proposed in the above paragraph. The tool used in this circuit implementation is fowerview which includes a set of specialized tools. Three of the tools used are: ViewDraw for schematic drawing; ViewSim for simulation; and ViewTrace for viewing signais. This project is created in Unix environment where Powerview is installed in EECG machines (e.g. halfdome.eecg).

Note about this pro-iectWrau Sequencer) Powerview : Version 5.3.0 (-cad/powerview) Li braries: x4000 (-pga/view2xilinx/x4000) xm4000 (-pga/view2xilinx/xm4000) built-in C/jajar/a2/workview.4.1.2/lib/builtin) Project: sequencer (-hungw/powerview/sequencer) Simulation -hungw/powerview/sequencer/sirn/sch.cmd cornmand file: Contact: [email protected]

Notes about usina Powerview Shell environment In .cshrc file, set path and the following parameter as: variables: set path=( -cad/powerview -pga/workview -pga/xilinx spath) setenv WDlR /guest/grads/hungw/powerview setenv SYSPLT / guest/grads/hungw/powerview setenv FONT1 6x10 setenv FONT2 8x13 setenv FONT3 9x15 setenv VIPC-KILL-PRE-VNSD TRUE

Viewdraw ininital Every time a new project is created by powerview, a viewdraw.ini file viewdrawini: file will be created in the project directory. At the end of the .ini file, add the library paths. For this project, the following lines are added: DIR [pw] Iguest/grads/hung w/powerview/sequencer DIR [r] /j ayar/a0/pga/view2xi~inx/x4000 (~4000) DIR [r] /jayar/a0/pga/view2xilinx/xm4000 (xm4000) DIR [r] /jayar/a2/workview .4.1.2nibbuiltin (builtin)

Move a project to The project list of powerview are stored in the following file: a different -hungw/powerview/vf/project.lst directory: Every time powerview starts, it checks this file. If a project in the file cannot be found, it will be deleted from the list and powerview cannot access to the project. Therefore, when a project is moved to a different location, this file has to be updated. References

Cuthbert, and J-C. Sapanel. "ATM: The Broadband Telecommunications Solution," The Institution of Electrical Engineers, UK. 1993. W.J. Goralski. "Introduction to ATM Networking." McGraw-Hill, New York, 1996. Acampora, and M. Naghshiheh. "An Architecture and Methodology for Mobile- Executed Handoff in Cellular ATM Networks," IEEE JASC, VoI. 12 No.8, October 1994. pp. 1365- 1375. Barton, and T.R. Hsing. "Architecture for Wireless ATM Networks," Proceedings of PIMRC 95. pp.778-782. Raychaudhuri, and N.D. Wilson. "ATM-Based Transport Architecture for Multiservices Wireless Persona1 Communication Networks," IEEE JSAC, Vol. 12 No.8, October 1992. pp.1401-1414. Schwartz. 'Wetwork Management and Control Issues in Multimedia Wireless Networks," IEEE Persona1 Communications, June 1995. pp.8- 16. P. Agrawal, E. Hyden, P. Krzyzanowski, P. Mishra, M.B. Srivastava, and J.A. Trotter. "SWAN: A Mobile Multimedia Wueless Network." IEEE Personal Communications, Aprii 1996, pp. 18-33. K.Y. Eng, M.J. Karol, M. Veeraraghavan, E. Ayanoglu, C.B. Woodworth, P. Pancha, and R.A. Valenzuela. "BAHAMA: A Broadband Ad-Hoc Wireless ATM Local-Area Network." ICC'95, pp. 12161223. R. Kautz. "A Distributed Fair Queueing Architecture for Wireless ATM Networks." University of Toronto technical report, 1995. 10. R. Kautz. "Distributed Fair Queueing Protocol and Scheduler Architecture." University of Toronto Technical Report. Nov. 1996. 11. C. Apostolas, R. Tafazolii, and B.G.Evans. 'Wireless ATM LM." ICC795,pp.773- 777. 12. M.Arad, R. Kautz. "TDMA and CDMA MAC Support for Multiple Qualities of Service in Wireless ATM Networks." SPIE Vo1.2917. pp.468-479. 13. M.J. Karol, 2. Liu, and K.Y. Eng. "Distributed-Queueing Request Update Multiple Access (DQRUMA) for Wireless Packe t (ATM) Networks." ICC*95, pp. 1224-123 1. Control." IEEE, November 1993. 1S. Golestani. "A Self-Clocked Fair Queuing Scheme for Broadband Applications." INFOCOM 94, pp.636-646. 16. S.J.Golestani. "Network Delay Analysis of a Class of Fair Queuing Algorithms." IEEE JSAC, pp. 1057-1070. 17. Parekh, and R.G. Gailager. "A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: The Single-Node Case." IEEE/ACM Transactions on Networking. June 1993. pp.344-357. 18. Jaha, and S. Mukherjee. Cany-Over Round Robin: A Simple Cell Scheduling Mechanism for ATM Networks." INFOCOM 96, pp.630-637. 19. Vishnu, and J.W. Mark. "HOL-EDD: A Flexible Service Scheduling Scheme for ATM Networks." INFOCOM 96, pp.647-654. 20. Chao, and D. Jeong. "Architecture Design of A Generalized Priority Queue Manager for ATM Switches." ISS 95, pp.394-398. 21. D.D.Falconer. "A System Architecture for Broadband Millimeter-Wave Access to an ATM LAN." IEEE Personal Communications. August 1996, Pp.36-4 1. 22. Roberts, P.E.Boyer, and M.J. Servel. "A Real The Sorter with Application to ATM Traffic Control." ISSY95,pp.258-262. 23. P. Boyer, Guillemin, Servel. "Method for Controllïng the Delivery From Cells." United States Patent No.5400336. March 21, 1995. APPLIED IMGE. Inc ---- 1653 East Main Street ,-. . Rochester, NY 14609 USA ---- Phone: 7161482-0300 -7------Fax: 71 61288-5989

(O 1993, Applied Image, Inc., All Rights Reserved