Cross MAC-PHY Layer Channel Access Mechanism for Enterprise LANs

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Wenjie Zhou, M.S.

Graduate Program in Computer Science and Engineering

The Ohio State University

2015

Dissertation Committee:

Prasun Sinha, Co-Advisor Kannan Srinivasan, Co-Advisor Dong Xuan Chunyi Peng c Copyright by

Wenjie Zhou

2015 Abstract

Enterprise Wireless LANs (EWLANs) have increasingly become prevalent in recent

years. They are widely deployed in public places, such as universities, malls, airports and

office buildings, providing connections. On the other hand, there is an explosive

increase in demand for wireless service as mobile devices such as smart phones and tablets

have gained immense popularity. Currently, the widely deployed 802.11 standard is shown

to have low channel efficiency and suffers from well known problems, such as hidden

terminal problem and exposed terminal problem. To meet the growing demand for wireless

data, it is time to move away from the age-old paradigm of prohibiting interfering nodes

from transmissions.

We first present Rapid Concurrent Transmission Coordination (RCTC), a fast and low overhead signaling mechanism based on Pseudo-random Noise (PN) sequences to enable multi-modal operation of wireless links in a distributed channel access setting to support concurrent transmissions with full duplex radios. We look beyond a pair and explore how a network can best utilize the full duplex capability.

In enterprise networks, access points (AP) are connected to the backbone network, which allows APs to communicate with each other. We takes advantage of the backbone network and seek opportunities for concurrent transmissions to increase the performance of EWLANs with three different techniques: DOMINO, BBN and BASIC.

ii DOMINO is a centralized relative scheme for channel access in enterprise networks. Existing centralized packet scheduling algorithms suffer from the requirement of tight time synchronization and are thus difficult to use in practice. We introduce Relative

Scheduling: a technique for triggering wireless transmissions through other wireless trans- missions in a domino-like fashion, thus making tight time synchronization unnecessary.

Today’s EWLANs are comprised of densely deployed APs. To take advantage of that, we present Blind Beamforming and Nulling (BBN), an interference nulling scheme that leverages the high density of access points to enable multiple mobile devices to transmit simultaneously to multiple access points (APs), all within a single . BBN scales the uplink throughput linearly with the number of clients.

Through proactive management of interference among multiple colliding packets, high throughput wireless systems can be designed. We propose BASIC, a lightweight multi-user uplink transmission strategy that does not require tight synchronization. In BASIC, every client transmits with a data rate that is able to tolerate interference from other transmissions.

Decoded packets are shared through the backbone network and their interference can be therefore cancelled.

RCTC and DOMINO are suitable for networks where the backbone network capacity is limited since they generate the least amount of traffic. BASIC is suitable for networks with small number of APs in a single collision domain while BBN is best in networks with high density of APs. This thesis discusses these schemes in detail and presents techniques to address the challenges in their practical implementations.

iii Dedicated to my family.

iv Acknowledgments

I really appreciate the guidance, support and encouragement from so many great people during my Ph.D. study. Without them, I would not be able to finish this dissertation. First, I want to thank my advisors, Prof. Prasun Sinha and Prof. Kannan Srinivasan. It is my honor to be brought to OSU by Prof. Prasun Sinha. I thank him for encouraging me to explore new techniques and teaching me how to tackle difficult problems. I am greatly benefited from the discussions with Prof. Kannan Srinivasan. He enriched my way of thinking. His tremendous support helps me to build self-confidence with my research.

I am very grateful for the support and advice from Dr. Matthew Hopcroft, Dr. Dola Saha and Dr. Sampath Rangarajan during my summer internships. Their unique perspectives on research problems and insightful suggestions helped me a lot in my research career. I would also like to thank my candidacy and dissertation committee members, Prof. Feng Qin, Prof.

Chunyi Peng, Prof. Dong Xuan for their helpful comments and suggestions.

During my Ph.D. study, I learned a lot from the collaborations with my colleagues Sam

Cooler, Dong Li, Tarun Bansal, Tanmoy Das, and Lu Chen. This dissertation would not have been done without their help. I really appreciate the help I received from Bo Chen.

We worked in the same lab for five years and had endless discussions about ideas and new techniques. Moreover, he saved my life during our Hawaii trip. I would also like to thank my lab colleagues: Shengbo Chen, Zhixue Lu, Yousi Zheng, Xiaofeng Wu, Bo Chen, Vivek

Yenanmandra, Yue Qiao, Arjun Bakshi, Ouyang Zhang, Gopi Krishna Tummala, Jiashang

v Liu, Rupam Kundu, and Wuwei Lan. I enjoyed those valuable moments with them from discussing research problems to having fun together.

My life will be totally different without my host family: Ray James and Tracie James,

Bobo Shi, Ran An, Xiaolei Guo and Linjun Tang. I will always remember the time we spent together, golfing, watching movies, celebrating birthdays and traditional American festivals.

Finally I want to thank the endless support and countless encouragement from my father

Hongyao Zhou, my mother Chunhua Hu, and my sister Lifen Zhou. Especially, I want to thank my wife, Qi Li. She has taken care of almost all of the family responsibilities, allowingme to focus more on my research. I would also like to thank my daughter, Isabella, who came into our life during my preparation of this thesis. Although she still cries a lot everyday, the happiness and pleasure brought by her will always be remembered. My family has always stood beside me, loving me, believing in me and giving me the strength to get out of difficulties. I am really fortunate to have them in my life.

vi Vita

2010 ...... B.S. Information Security, University of Science and Technology of China, Hefei, China 2014 ...... M.S. Department of Computer Science and Engineering, Ohio State University, Columbus, OH 2010-present ...... Ph.D. Department of Computer Science and Engineering, Ohio State University, Columbus, OH

Publications

Research Publications

Wenjie Zhou, Dola Saha and Sampath Rangarajan, A System Architecture to Aggregate Video Surveillance Data in Smart Cities, In Proc. of IEEE GLOBECOM, Dec. 2015.

Wenjie Zhou (co-primary), Tarun Bansal (co-primary), Kannan Srinivasan and Prasun Sinha, BBN: Throughput Scaling in Dense Enterprise WLANs with Blind Beamforming and Nulling, In Proc. of ACM MobiCom, Sept. 2014.

Wenjie Zhou and Kannan Srinivasan. SIM+: A Simulator for Full Duplex Communica- tions, In Proc. of SPCOM, July 2014.

Tarun Bansal (co-primary), Wenjie Zhou (co-primary), Kannan Srinivasan and Prasun Sinha, RobinHood: Sharing the Happiness in a Wireless Jungle, In Proc. of ACM Hot- Mobile, Feb. 2014.

Wenjie Zhou (co-primary), Dong Li (co-primary), Kannan Srinivasan and Prasun Sinha, DOMINO: Relative Scheduling in Enterprise Wireless LANs, In Proc. of ACM CoNEXT, Dec. 2013.

vii Wenjie Zhou, Kannan Srinivasan and Prasun Sinha, RCTC: Rapid Concurrent Transmis- sion Coordination in Full Duplex Wireless Networks, In Proc. of IEEE ICNP, Oct. 2013.

Fields of Study

Major Field: Computer Science and Engineering

Specialization: Networking

viii Table of Contents

Page

Abstract...... ii

Dedication...... iv

Acknowledgments...... v

Vita ...... vii

ListofTables ...... xiii

ListofFigures...... xiv

1. Introduction...... 1

1.1 Concurrent Transmission in Full Duplex Wireless Networks...... 3 1.2 RelativeSchedulinginEWLANs ...... 4 1.3 BlindBeamformingandNullinginDenseEWLANs ...... 6 1.4 Backbone-Assisted Successive Interference Cancellation...... 9 1.5 StructureofthisThesis...... 11

2. RCTC : Rapid Concurrent Transmission Coordination in Full Duplex Wireless Networks ...... 12

2.1 Introduction...... 12 2.2 DesignChallenges ...... 16 2.3 DesignofRCTCProtocolanditsComponents ...... 17 2.3.1 TransmissionModes...... 18 2.3.2 TransmissionModeIdentification ...... 18 2.3.3 ExposedTerminalIdentification ...... 20 2.3.4 Selecting Exposed and Secondary Receiver ...... 22 2.3.5 TransmitterSuppressing ...... 23

ix 2.3.6 PuttingItAllTogether...... 24 2.4 Implementation...... 27 2.4.1 USRPPrototype ...... 27 2.4.2 ExposedTransmission...... 28 2.4.3 SecondaryTransmission...... 29 2.5 Evaluations...... 29 2.5.1 APNetwork ...... 30 2.5.2 AdHocNetwork...... 35 2.6 Discussion ...... 36 2.7 RelatedWork...... 37 2.8 Conclusion ...... 39

3. DOMINO: Relative Scheduling in Enterprise Wireless LANs ...... 41

3.1 Introduction...... 41 3.2 Motivation ...... 45 3.3 DOMINODesign...... 46 3.3.1 ROP:RapidOFDMPolling ...... 47 3.3.2 RelativeScheduling ...... 52 3.3.3 ScheduleConverter ...... 58 3.3.4 DOMINOUnderMicroscope ...... 61 3.3.5 PracticalIssues...... 62 3.4 Evaluation ...... 63 3.4.1 Experimentation ...... 63 3.4.2 Trace-DrivenSimulation...... 64 3.5 Discussion ...... 69 3.6 RelatedWork...... 72 3.7 Conclusion ...... 74

4. BBN: ThroughputScalinginDenseEnterpriseWLANswithBlind Beamform- ingandNulling ...... 75

4.1 Introduction...... 75 4.2 Illustration ...... 79 4.3 Challenges ...... 82 4.4 PhysicalLayerDesign ...... 84 4.4.1 PhaseI:Clienttransmission ...... 84 4.4.2 PhaseII:Blind-beamforming ...... 85 4.4.3 PhaseIII:DecodingPackets ...... 86 4.4.4 ComputingthePacketDecodingOrder ...... 87 4.5 MACDesign ...... 89 4.5.1 Multi-CollisionDomain ...... 89

x 4.5.2 Computingthesetoftransmittingclients ...... 92 4.5.3 Robustness...... 95 4.6 Experiments ...... 97 4.6.1 Setup...... 97 4.6.2 Micro-Benchmarks ...... 97 4.6.3 Throughput...... 99 4.7 Trace-DrivenSimulation ...... 100 4.7.1 SimulationSetup...... 100 4.7.2 Results...... 101 4.8 Discussion ...... 103 4.9 RelatedWork...... 104 4.10 Conclusions...... 105

5. BASIC: Backbone-Assisted Successive Interference Cancellation...... 106

5.1 Introduction...... 106 5.2 Motivation:GainsfromExploitingDiversity ...... 110 5.2.1 AnExample ...... 111 5.2.2 Trace-drivenAnalysis ...... 113 5.3 ChallengesinPractice ...... 115 5.4 TheDesignofBASIC ...... 116 5.4.1 BASICOverview ...... 118 5.4.2 ChannelEstimationPhase ...... 119 5.4.3 DataRateSelectionPhase ...... 120 5.4.4 DataTransmissionPhase ...... 122 5.4.5 DecodingPhase ...... 123 5.4.6 CommunicationOverhead ...... 125 5.5 BASICinMultipleCollisionDomains...... 126 5.6 Experiments ...... 127 5.6.1 Microbenchmarks ...... 128 5.6.2 TestbedResults ...... 131 5.7 Trace-DrivenSimulation ...... 133 5.7.1 SimulationSetup...... 133 5.7.2 SingleCollisionDomain...... 134 5.7.3 MultipleCollisionDomains ...... 136 5.8 RelatedWork...... 138 5.9 DiscussionandConclusion...... 140

6. ConclusionsandFutureWork...... 142

Appendices ...... 147

xi A. RCTC...... 147

A.1 SignatureDetection ...... 147

B. BBN...... 150

B.1 Algorithm Satisfiable ...... 150

Bibliography ...... 152

xii List of Tables

Table Page

3.1 Parameters used for the OFDM symbol to convey the queue length of clients 48

3.2 Aggregate throughput in 3 different scenarios with USRP prototype . . . . 63

3.3 Aggregatethroughputwith4pairsofexposedlinks ...... 67

xiii List of Figures

Figure Page

1.1 An example with exposed and secondary transmissionopportunities. Three flows (TX→RX,RX→RX′, and EX→RX′′) exist in the network. The dotted line between TX and EX denotes these two nodes interfere with each other. Node pairs with no line between them are out of interfer- ence range. TX→RX and EX→RX′′ are exposed links while RX→RX′ forms a secondary transmission for the link TX→RX...... 4

1.2 A network with three AP-client pairs. Dashed lines between nodes indicate that the nodes can hear each other. Solid arrows denote flow directions. . . 6

1.3 The throughput on different links. The overall throughput of omniscient scheme is 76% higher than DCF and 61% higher than CENTAUR. DOMINO performsclosetotheomniscientscheme...... 6

1.4 Illustration of BBN over a topology of 3 clients and 4 APs. All devices belong to the same collision domain and can hear each other...... 8

1.5 A2×2 network with 2 clients and 2 APs which are connected through the backbone network. The weight (sij) of the dotted line is the received signal strength (RSS) at APj for transmission from Ci in Watts...... 10

2.1 An example with exposed and secondary transmissionopportunities. Three flows (TX→RX,RX→RX′, and EX→RX′′) exist in the network. The dotted line between TX and EX denotes these two nodes interfere with each other. Node pairs with no line between them are out of interfer- ence range. TX→RX and EX→RX′′ are exposed links while RX→RX′ forms a secondary transmission for the link TX→RX...... 14

xiv 2.2 Timingdiagrams for varioustransmissionarrangements. S(Tp), S(Rp), SH, SF are the signatures of Primary Tx and Primary Rx, a reserved signature for half-duplex transmission and another reserved signature for full duplex transmission...... 19

2.3 The USRP testbed topology and throughput of different scenarios. All nodescanheareachother...... 27

2.4 Aggregate throughput and Jain’s fairness index for different number of flowsperclient...... 30

2.5 Aggregate throughput and Jain’s fairness index with different value of α introducedinSection2.3.5...... 32

2.6 Aggregate throughput and Jain’s fairness index for differentnumberofAPs 33

2.7 Throughput of primary, exposed and secondary transmissions with differ- entnumberofAPs...... 34

2.8 Aggregate throughput and Jain’s fairness index of different ratio of down- linktraffic...... 35

2.9 Aggregate throughput for different number of nodes and flows per node in anadhocnetwork...... 36

3.1 A network with three AP-client pairs. Dashed lines between nodes indicate that the nodes can hear each other. Solid arrows denote flow directions. . . 43

3.2 The throughput on different links. The overall throughput of omniscient scheme is 76% higher than DCF and 61% higher than CENTAUR. DOMINO performsclosetotheomniscientscheme...... 43

3.3 TheconstructionofoneOFDMsymbol ...... 47

3.4 Theprocess of obtainingqueue statusfrom clients...... 48

3.5 Received OFDM samples from two clients. The system can tolerate 30 dB signal strength difference when using three subcarriers as guard interval. . 49

3.6 The relationship between number of guard subcarriers and the difference inRSS...... 50

xv 3.7 AnetworkwithfourAP-clientpairs...... 54

3.8 The timeline of the transmission between AP and client. S1 is the signature that should be sent by the client, S2 is the signature that should be sent by the AP, and S′ istheSTARTsignature...... 56

3.9 Detectionratioofmultiplesignatures...... 57

3.10 The timeline of the network in Figure 3.7. The link APi → APi stands for polling operation. The arrow between different links indicates the triggers betweendifferentslots...... 58

3.11 Maximum transmission misalignment at the start of transmissions . . . . . 65

3.12 TCP and UDP throughput, delay and fairness for T (10, 2). The downlink data rate is fixed to 10 Mbps and the uplink data rate varies from0 to10 Mbps...... 66

3.13 Exposed links example. Dashed links indicate nodes are interfering with eachotherandsolidlinksdenoteAP-clientpair...... 68

3.14 The CDF of throughput gain of DOMINO over DCF with 50 runs ..... 69

3.15 DOMINO consists of contention free period (CFP) and contention period (CoP). The CFP is divided into different slots and each slot supports mul- tipleconcurrenttransmissions...... 72

4.1 Illustration of BBN over a topology of 3 clients and 4 APs. All devices belong to the same collision domain and can hear each other...... 77

4.2 Received Signal Strength (RSS) in an office environment. The channel between APs is relatively stationary compared to channel between AP and mobileclient...... 79

4.3 CDF of number of APs observed across different locations. The data was collected at multiple places including a hospital, a large university library andanapartmentcomplex...... 80

4.4 Phase I time-line: ACi and AAj represent the access codes for Ci and APj, respectively...... 86

xvi 4.5 Phase II time-line: vi denotes the precoding vector of APi...... 87

4.6 Timeline of data transmission in a large network. The data sent by clients during contention phase are transmitted using the Rapid OFDM Polling (ROP) as discussed in DOMINO to decrease overhead. Phase III is exe- cuted in the background over the wired backbone allowing wireless channel tobeusedforotherpurposes...... 95

4.7 ExperimentresultscollectedoverUSRPtestbed...... 99

4.8 Trace-Driven Simulation Results for Multi-Collision Domain...... 100

5.1 A2×2 network with 2 clients and 2 APs which are connected through the backbone network. The weight (sij) of the dotted line is the received signal strength (RSS) at APj for transmission from Ci in Watts...... 109

5.2 The throughput gain of BASIC-OPT and SIC over TDMA with ideal data rates...... 114

5.3 The throughput gain of BASIC and SIC over TDMA for different network sizeswithdiscretedatarates...... 117

5.4 The timeline of the transmissions of APs and clients in the network shown inFigure5.1...... 118

5.5 Throughput Comparison of MaxSINR and Exhaustive Search. Network contains50clients...... 122

5.6 The SNR and Packet Reception Ratio of different and coding schemesusedinourevaluations...... 128

5.7 The residual interference to noise ratio (RINR) under different SNR condi- tions...... 129

5.8 The RINR for different part of the packet under different schemes...... 129

5.9 ThethroughputgainoverTDMAdistribution...... 131

5.10 Theperformanceinsinglecollisiondomain...... 135

xvii 5.11 Theperformance inmultiplecollisiondomains...... 137

A.1 Self correlation and cross correlation for Gold codes ...... 149

xviii Chapter 1: Introduction

With the rapidly increasing number of WiFi-capable devices, Enterprise Wireless LANs

(EWLANs) are becoming prevalent in office environments, campuses, airports and malls.

In addition, a number of cellular providers are deploying EWLANs in congested areas at scale to offload cellular traffic [5]. However, in EWLANs, the explosive growth in the number of mobile devices and the data generated by these devices has led to a decrease in the channel resources available to each individual device. Network administrators have tried to tackle this problem by densely deploying access points so that users can almost always find a close-by AP with good signal strength. However, dense deployment of APs simply does not scale well with the throughput demands [66].

To meet the rapidly increasing demand for wireless capacity, we need to go beyond tra- ditional strategies that prohibit interfering transmissions from being simultaneously active.

When multiple interfering transmissions are simultaneously active, proactive management of interference becomes essential for successful decoding of these packets. To avoid hid- den and exposed terminal problems, hybrid scheduling solutions, e.g., CENTAUR [88] and OmniVoice [10], have been proposed. However, they only focused on downlink traf-

fic. Several schemes, from the perspective of Information Theory, are proposed to scale the throughput with the number of wireless devices [24,36,74]. However, Interference

1 Alignment (IA) [24] requires clients to participate in a schedule with an exponential num-

ber of slots. Multi-User MIMO (MU-MIMO) [36] requires APs to exchange raw samples

over the backbone. Joint beamforming based algorithms [74] work only for the downlink

traffic. This thesis presents four new practical techniques to improve the performance of

EWLANs: RCTC, DOMINO, BBN, and BASIC. The recently proposed full duplex tech-

nique allows a wireless device to transmit and receive on the same channel simultaneously.

We first explored a distributed channel access algorithm, RCTC, which takes advantage of

full duplex capability and encourages concurrent transmissions whenever possible. Next,

a key observation in typical EWLANs is that all access points (AP) are connected through

a wired backbone, which naturally provides a side channel for centralized channel access scheduling. Algorithms for computing centralized schedules have been proven to outper- form distributed scheduling approaches although they are difficult to implement in practice.

DOMINO introduces relative scheduling to realize centralized scheduling with small traffic overhead on the backbone network. Finally, compared to channel access scheduling, mul- tiplexing gain boosts the wireless throughput linearly. Downlink traffic has been shown to enjoy gain with multi-user multiple-input and multiple-output (MU-MIMO) technique while uplink traffic experienced large overhead on backbone with MU-MIMO.

Both BASIC and BBN are proposed to achieve multiplexing gains for uplink transmissions and only require APs to exchange decoded packets. BASIC is suitable for networks with a small number of APs in a single collision domain while BBN is best in networks with high density of APs.

2 1.1 Concurrent Transmission in Full Duplex Wireless Networks

In 2010, the feasibility of in-band full duplex communication over a wireless link was shown independently by two research groups [27,32] using various self-interference can- cellation techniques. In full duplex communication, two nodes can simultaneously transmit packets to each other in the same channel, denoted as bi-directional mode. But in reality,

traffic is often asymmetric. To make full use of the full duplex technique, in Chapter 2,

this thesis presents RCTC, a rapid concurrent transmission alignment scheme for full du-

plex networks. First, we define two other communication modes: secondary transmission

is a mode of communication when a full duplex receiver while receiving a packet can

simultaneously transmit a packet to a node other than its transmitter, called a secondary

receiver [89]; unidirectional mode occurs when the receiver does not have any packets to

transmit. RCTC tries to identify the transmission mode and align as many transmissions as

possible in a full duplex network.

Figure 1.1 shows how RCTC works. Let us focus only on the case when TX is trans-

mitting a packet to RX. Since node pairs without any line between them are out of in- terference range, nodes TX and EX form a pair of exposed terminals, and node RX can

transmit a secondary packet to node RX′ when node TX is transmitting. Let us assume that RX has packets for RX′ and EX has packets for RX′′. In the unidirectional mode, the total throughput is 1 unit, as RX and EX are not transmitting. In the bi-directional mode, as RX does not have any packets for TX, the total throughput still remains as 1 unit. In the secondary transmission mode, TX and RX can simultaneously send packets to their receivers achieving a throughput of 2 units. Thus, for a node-pair based strategy that tries to best utilize full duplex capability for its transmissions, the net throughput is at most 2 units. However, using a network-centric strategy, EX can concurrently transmit,

3 EX TX

RX” RX RX’

Figure 1.1: An example with exposed and secondary transmission opportunities. Three flows (TX→RX,RX→RX′, and EX→RX′′) exist in the network. The dotted line between TX and EX denotes these two nodes interfere with each other. Node pairs with no line between them are out of interference range. TX→RX and EX→RX′′ are exposed links while RX→RX′ forms a secondary transmission for the link TX→RX.

achieving a net throughput of 3 units. If more exposed non-interfering links are present, the throughput can be even higher.

The network-centric design outlined above requires addressing the following challenges.

First, it requires the knowledge of whether the receiver (RX) has any packet to send back to the primary transmitter (TX) or has packets to send to a secondary receiver (RX′). Second, it requires that neighboring nodes of TX and RX identify the mode of operation of RX and accordingly decide whether they could transmit simultaneously. This decision needs to be made almost instantaneously at EX so that the start times of these transmissions are not far away with each other. Additionally, when more than one exposed transmission is feasible for a primary node pair, one more requirement is to identify which of the exposed nodes could transmit concurrently with the primary transmission such that the exposed transmissions do not collide with each other.

1.2 Relative Scheduling in EWLANs

The centralized structure of EWLANs has been leveraged for developing efficient so- lutions to various challenging problems such as channel assignment [23, 58, 67], client

4 association [19,23,67], and power management [23,63]. On the other hand, it could also

be used for implementing centralized scheduling of channel access. Centralized schedul-

ing algorithms do not suffer from the performance limitations of distributed approaches,

but are non-trivial to implement due to tight time synchronization requirement.

We use the network shown in Figure 1.2 as an example to show the potential of a

centralized scheme in achieving high throughput performance. There are three AP-client

pairs and three flows. AP1 is a hidden terminal to AP3 while C2 and AP1 are exposed terminals to each other. DCF performs poorly on this network. Because of the hidden terminal problem, the link AP3→C3 achieves little throughput if all of the transmitters are backlogged. On the other hand, CENTAUR [88], a hybrid solution, schedules the

AP1→C1 and AP3→C3 links to avoid the hidden terminal problem, but it is unable to schedule uplink traffic in the same slot, missing out on the opportunity for the exposed transmission C2→AP2. In an omniscient centralized scheduling scheme, the link C2→AP2 can always access the channel while the links AP1→C1 and AP3→C3 occupy the channel alternately. Figure 1.3 presents the throughput of different links using different scheduling algorithms.

In Chapter 3, this thesis presents DOMINO1, that can achieve the optimality of cen- tralized schemes without depending on time-synchronization among APs. Toward meeting this objective, we propose Relative Scheduling, which is the core component of DOMINO.

In sharp contrast to strict scheduling, Relative Scheduling uses wireless triggers which are created by a set of PN-sequences transmitted by the sender and receiver at the end of a data packet transmission. The transmission events are triggered by previous events akin to a domino effect. In any given slot, a carefully chosen set of transmissions trigger the

1A joint work with Dong Li

5 C2 C1 C3

AP 2 AP 1 AP 3

Figure 1.2: A network with three AP-client pairs. Dashed lines between nodes indicate that the nodes can hear each other. Solid arrows denote flow directions.

18 16 DCF CENTAUR 14 DOMINO 12 Omniscient 10 8 6 4 Throughput (Mbps) 2 0 AP1->C1 C2->AP2 AP3->C3 Overall

Figure 1.3: The throughput on different links. The overall throughput of omniscient scheme is 76% higher than DCF and 61% higher than CENTAUR. DOMINO performs close to the omniscient scheme.

transmissions in the next slot. Triggering using wireless transmissions is a new concept which is a complete paradigm shift from time synchronization-based protocol designs.

1.3 Blind Beamforming and Nulling in Dense EWLANs

To improve uplink communication efficiency, coordinated multipoint (CoMP) [47] has been proposed for LTE networks. In CoMP, base stations exchange received samples with each other to decode the uplink packets in a MIMO fashion. However, base stations in LTE networks are connected through dedicated high-speed fiber, which provides much higher capacity than backhaul in typical enterprise networks. Researchers have shown

6 that exchanging raw samples can lead to unreasonable traffic on the Ethernet [39,41,107].

In Chapter 4, this thesis proposes BBN2, the first implementation of Blind Beamform- ing and Nulling scheme that enables multiple nearby access points to concurrently receive uplink packets from multiple mobile clients, all within a single collision domain without overwhelming the backbone. BBN does not increase energy consumption on the clients and executes exactly over two time slots. BBN leverages three properties that are unique to EWLANS: (i) Dense deployment of APs (See [66]); (ii) Capability of these APs to ex- change packets with each other over the underutilized wired backbone; and, (iii) Immobility of APs resulting in relatively stationary channels. When one AP is receiving uplink data, existing algorithms [66] including IEEE 802.11 WiFi, suppress nearby APs to transmit or receive data. In contrast, BBN makes use of the energy-rich access points to assist their clients (mobile devices) in decoding their packets at their respective access points. In BBN, the clients only participate in the first slot and the access points participate for the clients in the second slot.

Consider the example enterprise WLAN shown in Fig. 1.4(a) where all the APs and the three clients are in a single collision domain. Assume that the three users want to upload one packet each to the backbone. An omniscient TDMA scheduling algorithm with global knowledge would require three time slots to complete this upload. In BBN, in the

first slot as shown in Fig. 1.4(a), all users will transmit at the same time. All the 4 APs will receive a combination of three transmitted packets. In the second slot, AP3 and AP4 will retransmit the received signals by first precoding [48] them such that the following condition is satisfied as shown in Fig. 1.4(b): At AP1, samples corresponding to x2 and

2A joint work with Tarun Bansal

7 h(1) x + h (1) x 11 1 21 2 h(1) x + h (1) x + h(1) x 12 1 22 2 31 3 (1) (1) + h 32 x3 a x + s h x + 11 1 1 21 2 a12 x1 + a 22 x2 + s h(1) x 1 31 3 a32 x3 AP 1 AP 2

Switch AP 1 AP 2 Switch

AP 3 AP 4

AP 3 AP 4 x1 x2 x3 (b) Second slot. A subset of APs transmit in the second slot while the rest of the APs C1 C2 C3 receive. aij are the final channel coefficients (a) First slot. x1, x2 and x3 are the three after the transmission of the second slot. si packets transmitted by C1, C2 and C3, is the scaling coefficient at APi. 1 respectively. hij is the channel from client i to APj during time slot 1.

Figure 1.4: Illustration of BBN over a topology of 3 clients and 4 APs. All devices belong to the same collision domain and can hear each other.

x3 in the second slot align with the samples corresponding to x2 and x3 in the first slot.

Decoding happens in multiple steps as follows:

1. At the end of the second slot, AP1 scales the samples received by AP1 in the second

slot and subtracts them from the samples received in the first slot. This scaling is

done such that samples corresponding to x2 and x3 are nulled. Afterwards, it is left

with only the samples corresponding to x1. AP1 decodes the samples to obtain the

packet transmitted by C1. Next, it transmits the decoded packet over the backbone to

AP2.

8 2. AP2 recreates the samples corresponding to x1 and subtracts them from the samples

received in the first slot and the second slot.

3. After subtraction, AP2 is left with two equations (one from each slot), and two vari-

ables (x2 and x3). AP2 solves the two equations to obtain x2 and x3.

4. Afterwards, AP1 and AP2 forward x1, x2 and x3 towards their destinations.

BBN enables the three transmitters with single antenna to upload three packets in two

slots, improving the throughput by 50% compared to omniscient TDMA.

1.4 Backbone-Assisted Successive Interference Cancellation

Although BBN removes the synchronization requirement for clients, it still needs the

APs to maintain sample-level synchronization. Such a requirement is still a hindrance in rapid deployment of this technology as it is non-trivial to meet such synchronization requirements. It also requires a large number of APs (O(N 2) to support N uplink transmis- sions) which puts an additional requirement on the network density. So, a pressing ques- tion is - Can we enable uplink multi-user transmissions in practical systems, i.e., without requiring tight synchronization among APs or clients, without overwhelming the backbone network, and without requiring a high AP density?

In a realistic environment, the added dimension of diversity offered by multiple re- ceivers (or base-stations) can be cleverly leveraged to apply the interference cancellation technique in a distributed fashion. Based on this observation, this thesis presents BASIC, a novel lightweight multi-user uplink transmission technique that does not require tight synchronization and does not impose any restrictions on the AP density. BASIC exploits the inherent receiver diversity and takes advantages of the Ethernet backbone connection

9 .ackbone !t !t 1 2 s s 21 12 s s 11 22

/ / 1 2

Figure 1.5: A 2×2 network with 2 clients and 2 APs which are connected through the backbone network. The weight (sij) of the dotted line is the received signal strength (RSS) at APj for transmission from Ci in Watts.

between APs which allows them to exchange decoded packets with each other. It decodes multiple simultaneously transmitted uplink packets according to a chosen sequence but in contrast to SIC, each of these packets can be decoded at a different APs. Each decoded packet is forwarded to the succeeding APs where its interference can be removed so that desired packets can be decoded. Thus, a group of APs collaborate to decode a group of si- multaneously transmitted uplink packets while leveraging the backbone. A greedy heuristic is proposed to determine the transmission and decoding plan.

To show how BASIC works, we use Figure 1.5 as an example. Assume each client has a packet to send to the associated AP. BASIC allows both clients to transmit simultane- ously. To achieve correct decoding of both packets, the data rate for C1 is carefully selected such that the packet can be decoded with a signal-to-noise-ratio (SNR)3 of s11 . With this s21 requirement, AP1 could receive the packet sent by C1 correctly with the interference from

C2. The decoded packet is then delivered to AP2 over the backbone. AP2 then subtracts

3For simplicity, the channel noise is ignored here.

10 this packet from the received samples and decodes the packet from C2 without any inter- ference. To quantify the gains of BASIC, we choose s11 and s22 to be 20 dB higher than the noise floor, while picking s12 and s21 to be 10 dB higher than the noise floor. For this example, TDMA schedules C1 and C2 alternately with 20 dB SNR. SIC has no gain over

TDMA. It allows C1 to transmit with 10 dB SINR to AP1. After decoding the packet from

C1, we can subtract its interference from the received samples and decode C2’s packet with

10 dB SNR. After decoding C1, we can subtract its interference from the received samples and decode C2 in AP1. For BASIC, it also allows C1 to transmit with 10 dB SINR to AP1.

The decoded packet is forwarded to AP2. After interference cancellation, packets from C2 can be decoded with a SNR of 20 dB at AP2. So BASIC achieves 1.5× the throughput of

both TDMA and SIC.

1.5 Structure of this Thesis

The rest of this thesis is organized as follows. In Chapter 2, we present how RCTC

uses PN-sequence to enable multi-concurrent transmissions in full duplex communica-

tions. Chapter 3 discusses the design of DOMINO and the techniques to make Relative

Scheduling work in practice. In Chapter 4, we present how BBN scales the uplink through-

put linearly in enterprise networks. Chapter 5 describes BASIC, a light-weighted scheme

that supports multiple uplink transmissions based on receiving signal diversity. Chapter 6

concludes this thesis with pointers to open research problems.

11 Chapter 2: RCTC : Rapid Concurrent Transmission Coordination in Full Duplex Wireless Networks

2.1 Introduction

In 2010, the feasibility of in-band full duplex communication over a wireless link was shown independently by two research groups [27,32] using various self-interference can- cellation techniques. In full duplex communication, two nodes can simultaneously transmit packets to each other in the same channel, denoted as bi-directional mode in this chapter.

These designs were further improved with fewer antennas and to achieve better perfor-

mance [30, 49]. But in reality, traffic is often asymmetric. For example, mobile operators

and broadband network providers often provision their systems considering the download-

heavy traffic pattern that is most prevalent in such networks. As a result, it is difficult to

reap the benefits of full duplex in such scenarios. Similarly, secondary transmission is

another mode of communication when a node is full duplex capable, i.e., capable of self-

interference cancellation. In this mode, a full duplex receiver, while receiving a packet, can

simultaneously transmit a packet to a node other than its transmitter, called a secondary

receiver [89]. In an access point (AP) network, where every node connects to the Internet

through an AP, a client cannot avail this mode as each client is connected to one AP at

any given time. Even the AP may only have few opportunities to use this mode due to the

12 asymmetry in downlink and uplink traffic. Furthermore, when the receiver does not have

any packets to transmit, the transmitter-receiver pair operates in the unidirectional mode.

In this mode, the receiver sends a busy tone or an artificially created packet with no useful content to protect the reception from hidden terminals. Thus, these current modes of trans- mission using full duplex may not always be the best way to utilize the potential of full duplex in practical scenarios.

The three identified modes are based on the ability of a pair of nodes in a network to best utilize the full duplex capability. In this chapter, we go beyond a pair of nodes and in- vestigate how a network can best utilize the full duplex capability. To this end, we propose a (MAC) protocol for full duplex networks called RCTC that seeks to maximize the overall network throughput: it attempts to identify other transmissions in the vicinity that can be executed concurrently. In other words, RCTC tries to align as many transmissions as possible in a full duplex network. Note that, depending on the mode of operation, the set of concurrent links can be different. For instance, in the bi-directional mode, typically the nodes in the vicinity of the transmitter and receiver cannot concur- rently transmit, while in the unidirectional mode, nodes close to the transmitter and farther from the receiver could transmit concurrently. The mode of operation, in turn, depends on whether the receiver has a packet to send to its transmitter (bi-directional) or to another node (secondary transmission) or has nothing to send (unidirectional). Therefore, RCTC should be able to identify the mode of operation and the concurrent links on-the-fly. We

first motivate the need for a network-centric full duplex protocol design as opposed to a

node pair-centric design. Then, we introduce the novel mechanisms in RCTC.

An Example: Figure 2.1 shows an example scenario. Let us focus only on the case when

TX is transmitting a packet to RX. Since node pairs without any line between them are

13 EX TX

RX” RX RX’

Figure 2.1: An example with exposed and secondary transmission opportunities. Three flows (TX→RX,RX→RX′, and EX→RX′′) exist in the network. The dotted line between TX and EX denotes these two nodes interfere with each other. Node pairs with no line between them are out of interference range. TX→RX and EX→RX′′ are exposed links while RX→RX′ forms a secondary transmission for the link TX→RX.

out of interference range, nodes TX and EX form a pair of exposed terminals, and node

RX can transmit a secondary packet to node RX′ when node TX is transmitting. Let us

assume that RX has packets for RX′ and EX has packets for RX′′. The packets are all

of equal size. Let us ignore control and backoff overheads. In the unidirectional mode,

the total throughput is 1 unit, as RX and EX are not transmitting. In the bi-directional

mode, as RX does not have any packets for TX, the total throughput still remains 1 unit.

In the secondary transmission mode, TX and RX can simultaneously send packets to their receivers achieving a throughput of 2 units. Thus, for a node-pair based strategy that tries to best utilize full duplex capability for its transmissions, the net throughput is at most 2 units.

However, using a network-centric strategy, EX can concurrently transmit, achieving a net throughput of 3 units. If more exposed non-interfering links are present, the throughput can be even higher.

Challenges: The network-centric design outlined above requires addressing the following challenges. First, it requires the knowledge of whether the receiver (RX) has any packet to send back to the primary transmitter (TX) or has packets to send to a secondary re- ceiver (RX′). Second, it requires that neighboring nodes of TX and RX identify the mode

14 of operation of RX and accordingly decide whether they could transmit simultaneously.

These two requirements together imply that the exposed node (EX) should identify if RX is sending a packet back to TX or not, and only align its transmission if there is no return transmission. This decision needs to be made almost instantaneously at EX so that the start times of these transmissions are not too far. Additionally, when more than one ex- posed transmission is feasible for a primary node pair, one more requirement is to identify which of the exposed nodes could transmit concurrently with the primary transmission such that the exposed transmissions do not collide with each other.

Solution Overview: To address the above challenges, RCTC uses pseudo-random noise

(PN) sequence-based signatures carried in packets to help quickly identify different modes of full duplex and help exposed nodes to identify when they should transmit. Recently, such signatures have been demonstrated to be usable for conveying packet corruption in- formation from the receiver to the sender [83], for identifying the receiver of a packet [105] and for encoding control packets such as RTS, CTS and ACK [62]. Our solution works as follows. Each node uses a random backoff to contend for the channel if it has a packet to transmit. Upon obtaining access to the channel, it starts transmitting two signatures which indicate the IDs of the primary receiver and the transmitter. IDs are locally unique and various mechanisms could be used to select it [62]. Upon receiving these signatures, the receiver looks up in its queue to determine if other transmission modes are feasible. It then selects an appropriate mode and indicates its selection to the transmitter also using signa- tures. The transmit power of signatures from the primary receiver is controlled to provide a sense of how much interference could be tolerated by the receiver. Upon receiving the mode selection from the primary receiver, the primary transmitter continues to transmit the

15 data packet. The exposed terminals, on the other hand, identify themselves through the

signal strength from the signature sent by the receiver.

This chapter makes the following contributions:

• We develop a fast and low overhead signaling mechanism to enable multi-modal

operation of wireless links in a distributed channel access setting.

• We extend the signaling mechanism to support concurrent transmissions in the neigh-

borhood.

• Using a USRP testbed and simulations we show that our solution RCTC significantly

outperforms the state-of-the-art full duplex schemes in practical scenarios. A 79%

throughput gain is achieved in our USRP testbed compared with native full duplex

solution and as high as 131% average throughput gain without hurting fairness in our

simulations for large networks.

The rest of this chapter is organized as follows. Section 2.2 describes the design chal- lenges. Section 2.3 presents the various components of RCTC and describes how they work together. Results from our USRP prototype and simulations are presented in Sections 2.4 and 2.5, followed by discussion, related work and conclusion sections.

2.2 Design Challenges

Considering the recently proposed full duplex techniques at the , a practi- cal and efficient distributed channel access scheme has to address the following challenges:

• Transmission Mode Identification: The potential receiver of a primary transmission

(the first transmission resulting from a successful contention) needs to determine the

16 best option among those three modes based on the available packets in its queue. It

then needs to inform the primary transmitter as well as other surrounding nodes of

its decision with minimal delay and overhead.

• Exposed Terminal Identification: The potential exposed terminal should be able to

identify its role in the transmission based on the signal received. The transmission

from an exposed terminal should not affect the reception at the primary receiver and

the primary transmitter as well. Then it quickly aligns its own transmission with the

primary transmission.

• Picking Exposed and Secondary Receivers: The potential receiver of an exposed or

secondary transmission may be interfered by the signal from the primary transmit-

ter. Randomly picking a receiver could end up in collision, which does not help to

increase the system throughput.

These design challenges need to be addressed with a low overhead solution that can rapidly coordinate the transmission activities of nodes without prior knowledge of the infor- mation on packets in queues at neighboring nodes and little information about the channel state between different nodes.

2.3 Design of RCTC Protocol and its Components

In this section, we explain the design details of RCTC. We first define the three trans- mission modes and the process to identify different modes. Then the technique to identify exposed terminals is elaborated. We next discuss how to pick receivers for secondary and exposed transmissions. To protect bi-directional transmission, a technique called transmit- ter suppressing is introduced. Finally all the components are combined together.

17 2.3.1 Transmission Modes

When a node accesses the channel and transmits a packet, the receiver responds based on the contents of its queue. There are three possible modes of transmission:

• Bi-directional Transmission Mode: The primary receiver uses this mode if a return

packet is available for the primary transmitter.

• Secondary Transmission Mode: If a bi-directional transmission is not feasible, the

primary receiver searches its queue for potential transmissions to other nodes. This

secondary transmission receiver should be able to correctly receive the packet under

the interference of the primary transmitter.

• Unidirectional Mode: This mode is similar to half duplex, except that the receiver

sends a busy tone to protect the packet being received from hidden terminals. This

mode is used if the above two modes are not feasible.

In all the three modes, exposed terminal transmissions may be feasible in the neighbor- hood of the transmitter(s). But proper checks must be performed to ensure that the ongoing primary and secondary transmissions are not excessively interfered by the exposed trans- missions and vice-versa.

2.3.2 Transmission Mode Identification

RCTC makes use of physical layer signatures based on PN sequences that are used for rapid and low-overhead communication and coordination. More details about signa- ture detection are in Appendix A.1. The coordination process is as follows. The primary transmitter first sends two signatures that encode the locally unique IDs of the receiver and

18 Primary Transmitter: SF S(Rp) S(Tp) Packet ACK

Primary Receiver : S(Tp) Packet ACK

(a) Bi-directional mode (Primary Tx↔Primary Rx). The signature SF is sent only when needed and will be introduced in Section 2.3.5.

Primary Transmitter: S(Rp) S(Tp) Packet

Primary Receiver: SH Packet ACK Secondary Receiver : ACK

(b) Secondary transmission mode (Primary Tx→Primary Rx→ Secondary Rx)

Primary Transmitter: S(Rp) S(Tp) Packet

Primary Receiver: SH Busy tone ACK Exposed Transmitter: Packet

Exposed Receiver: ACK

(c) Unidirectional transmission mode with exposed transmission (Primary Rx←Primary Tx Exposed Tx→Exposed Rx)

Figure 2.2: Timing diagrams for various transmission arrangements. S(Tp), S(Rp), SH, SF are the signatures of Primary Tx and Primary Rx, a reserved signature for half-duplex transmission and another reserved signature for full duplex transmission.

transmitter as shown in Figure 2.2. The receiver identifies the transmitter using its signa-

ture and determines the best mode based on the packets in its queue and their expected

physical layer data rates to their next-hop destinations. The sender is notified of the se-

lected mode using different return signatures. The transmitter’s signature S(Tp) indicates a bi-directional transmission mode while a special signature SH is used for unidirectional or

secondary transmission mode. We call this process as rapid handshaking.

19 All nodes have a unique ID (as a signature) within its one-hop neighborhood. The details about how to create and assign the signatures are beyond the scope of this chapter.

A naive scheme is as follows. Each node randomly selects a signature from a code-book and exchanges this information periodically with its neighbors. Conflicting signatures are resolved by repeating the process. Alternatively, the signatures may be generated using a hashing function of a node’s MAC address [62].

2.3.3 Exposed Terminal Identification

The definition of the exposed terminal suggests that its transmission should not affect packet reception of the primary receiver. Because the packet decoding success probability is positively correlated to the signal-to-interference ratio (SIR), we can use SIR to - tify exposed terminals. When the potential SIR at the primary receiver is higher than the minimum requirement to decode the packet correctly, the exposed terminals are allowed to transmit. Assume that the Wi-Fi transmit power is P0; the channel coefficient between the primary transmitter (node T ) and the primary receiver (node R) is hT R and the coefficient between another node E and node R is hER. Let us denote the received signal strength at node R as PT R from node T and PER from node E. Node E is categorized as an exposed terminal when Equation 2.1 is satisfied.

2 2 PT R P0|hT R| |hT R| SIRR = = 2 = 2 > ∆d (2.1) PER P0|hER| |hER|

∆d is a predefined constant related with the transmission data rate d. Large ∆d results in

higher packet reception ratio of the primary transmissions while small ∆d can support a

larger number of simultaneous exposed terminal transmissions. In RCTC, we pick ∆d for each data rate such that the packet reception ratio (from node T to node R under interfer- ence from node E) is above 95%.

20 Equation 2.1 requires the value of both hER and hT R while the latter is not available to the potential exposed terminals. RCTC uses a simple mechanism to ensure that any potential exposed terminal can estimate if it is safe to transmit based on the observed power level of the signature from the primary receiver. The primary receiver, on the other hand, dynamically adjusts its transmit power of the signature according to hT R. Then the received signal strength of the signature at the potential exposed terminal reflects the relationship between hER and hT R. Specifically, when the primary receiver transmits SH as shown in

Figure 2.2, it sets the transmit power to

Cd Pr = , (2.2) PT R

where Cd is a constant related to the data rate d. Assume that the wireless channels in the

two directions between any pair of nodes are symmetric. Then the received signal power at

node E is

2 2 Cd 2 Cd|hER| PRE = Pr|hER| = |hER| = 2 . (2.3) PT R P0|hT R|

From Equations 2.1 and 2.3, the following equation is satisfied at the exposed terminal

2 Cd|hER| Cd PRE = 2 < . (2.4) P0|hT R| P0∆d

Since it is difficult for the exposed terminal to know the transmission data rate between

the primary transmitter and the primary receiver ahead of time, we choose Cd = C(P0∆d),

where C is a pre-defined constant known to every node. So Equation 2.4 becomes

Cd PRE < = C. (2.5) P0∆d

Then node E is safe to transmit if the received signal strength of the signature is lower than

C regardless of the data rate between node T and node R.

21 2.3.4 Selecting Exposed and Secondary Receiver

In the previous section, we introduce the scheme to identify the exposed terminal. How- ever, it only guarantees that the signal from the exposed terminal is not hurting the reception at the primary receiver. It is not clear whether a potential receiver of the exposed terminal could receive correctly or not as it could be close to the primary transmitter. Similarly, in the secondary transmission mode, the reception at the secondary receiver may be affected.

One of the recent works for exploiting exposed terminal transmissions, CMAP [97] maintains a probability map which stores the success reception ratio when two links are transmitting simultaneously. Another work CMAC [86] uses the relationship between the

SIR and packet reception ratio (PRR) to check the ability for concurrent transmissions.

These schemes, however, only characterize the relationship between a pair of links. The combined interference from two or more exposed terminals is not considered. Our simula- tion results in Section 2.5 show that the performance of CMAP degrades as the number of exposed terminals increases.

RCTC also utilizes history information to pick the receiver of the exposed transmission and secondary receiver, however, in a different way from prior work. Each node keeps two lists: an ExMap for exposed transmissions, and a SecMap for secondary transmissions.

Both of them consist of a triplet: {Tx,Rx,p}, where T x is the primary transmitter, Rx is the potential exposed or secondary receiver, and p is the probability that the reception at Rx

could be successful when T x is the primary transmitter. The probability p is maintained

according to the following rules:

• Set p to 1 upon a successful transmission.

• Halve p upon a failed transmission (i.e. ACK timeout).

22 When an exposed or secondary transmission opportunity appears, the RX with the highest

p is selected as the receiver. The exposed transmission, however, is not always carried

out. Instead, it happens with the probability p. This scheme helps to resolve the collision

between multiple exposed terminals.

2.3.5 Transmitter Suppressing

In section 2.3.3, exposed terminals identify themselves using Equation 2.5. This, how-

ever, raises a question: what should the exposed terminals do if they don’t receive the

return signature from the primary receiver? On one hand, absence of a return signature

signal means that the exposed terminal is beyond the communication range of the primary

receiver, which suggests that its transmission will not hurt the reception at the primary re-

ceiver. On the other hand, the primary receiver may choose to send a return transmission

and the exposed transmission may hurt the reception at the primary transmitter.

RCTC uses the following approach to suppress exposed terminal transmissions. The transmitter broadcasts a reserved signature SF (as shown in Figure 2.2(a), F denotes full duplex transmissions) when its reception of the return transmission is severely affected by the exposed terminals. Specifically, each primary transmitter T x keeps a list consisting of:

{Rx,Plost}, where Rx is a potential receiver of T x, and Plost is the ratio of unsuccessful return transmissions. T x maintains Plost using the history in a given time window tw and sets it to 0 when no return transmission happens in tw. Then, if the ratio Plost is above a threshold, α, the transmitter broadcasts a special suppressing signature, SF , before it sends the signature of the primary receiver. Exposed terminals should not start a concurrent transmission if SF is received. Otherwise they could start transmitting when the return

23 S(Tp) ← primary transmitter’s signature; S(Rp) ← primary receiver’s signature; SH ← half duplex signature; SF ← full duplex signature; treturn ← the expected time a return signal should appear; Plost(Rp) ← the return packet reception ratio from Rp; while has packet to send do if channel access granted then if Plost(Rp) > α then Send SF ; end Send hS(Rp),S(Tp)i in sequence; if return signal detected in treturn then Use S(Tp),SH to correlate the received signal; if signature detected then Send the packet; Wait for ACK and update Plost(Rp); else Abort this transmission; end else Abort this transmission; end end end Algorithm 1: Operation of a primary transmitter

signature from the primary receiver is not detected. We study the value of α in Section 2.5 and pick α =0.1 as a tradeoff between network throughput and fairness.

2.3.6 Putting It All Together

Algorithms 1, 2 and 3 outline the channel access logic of the primary transmitter, re- ceiver and the exposed terminal. Note that these three algorithms are running concurrently at all of the nodes and there is no pre-assigned role for any node. TP , the primary trans- mitter, has a packet for RP , the primary receiver. TP performs carrier sensing and backoff, and begins transmission upon successful channel contention. Its transmission consists of

24 S(Tp) ← primary transmitter’s signature; S(Rp) ← primary receiver’s signature; SH ← half duplex signature; while S(Rp) detected do Correlate with its neighbors’ signatures; if S(Tp) detected then if packet for Tp exists then Send S(Tp) and a return packet in sequence; end else Send SH with power Pr (Equation 2.2); if a secondary receiver exists then Send a packet to the secondary receiver; Wait for ACK and update SecMap; end else Send busy tone signal; end end end end Algorithm 2: Operation of a primary receiver

hS(RP ),S(TP )i where S(RP ),S(TP ) are the signatures of the primary receiver and trans-

mitter, respectively. While waiting for an incoming transmission, RP continuously corre-

lates the channel with S(RP ). Upon detecting S(RP ), RP starts correlation with signatures of all neighbors that it communicates with. After detecting S(TP ), RP identifies the trans- mitter and searches its outgoing packet queue for packets destined for TP . RP has the following options based on the result of searching its queue.

1. RP finds a packet for TP : RP has a return packet, and transmits hS(TP )i, indicating

a bi-directional transmission.

2. RP finds no packet for TP : RP searches its queue for packets to other neighboring

nodes, and uses the SecMap to check the feasibility of such a transmission. Upon

25 finding a suitable secondary receiver, RP transmits back SH and the packet. Oth-

erwise, RP transmits SH followed by a busy tone which lasts until the end of the

incoming packet.

The exposed terminal keeps track of the transmitter’s ID and then keeps correlating SH .

Upon detection of SH , if the signal strength of SH satisfies Equation 2.5, it then selects a suitable receiver and transmits the packet with the specific probability.

SH ← half duplex signature; SF ← full duplex signature; treturn ← the expected time a return signal should appear; while correlating with neighbors’ signatures and SF do if SF detected then Wait for the end of this transmission; continue; end Store the transmitter’s ID (Tp); Keep correlating the received signal with SH ; if SH detected in treturn then if signal strength of SH

26

30 RCTC 30 FDNative 30 CF 25 25 25 20 20 20 15 15 15 N2 10 10 10 Throughput (Kbps) Throughput (Kbps)

Throughput (Kbps) 5 5 N1 5 0 0 → → N → N N N N → N N N 4 5 0 4 5 2 1 3 4 N N → N 5 3 4 N N 3 4 (c) The throughput (d) The throughput (b) The (a) The USRP when links N4→N5 and when links N3→N4 and throughput placement in our N2→N1 are enabled. N4→N5 are enabled. experiment. The when only link RCTC achieves 59.1% RCTC achieves 78% solid arrows N3→N4 is higher throughput than higher throughput than indicate the traffic enabled. FDNative. FDNative. direction.

Figure 2.3: The USRP testbed topology and throughput of different scenarios. All nodes can hear each other.

2.4 Implementation

In this section, we implement RCTC in a testbed with 5 USRPs. We compare the

throughput of three different schemes in this section: RCTC, FDNative, and CF [89]. FD-

Native is the baseline full duplex solution where a primary receiver sends a packet back to

the transmitter if a packet is available. If the receiver has no return packet in the queue, a

busy tone signal is sent out. In CF, both return and secondary transmissions are enabled.

2.4.1 USRP Prototype

Each USRP in our prototype consists of two antennas working in the same channel, one

for transmitting and the other for receiving. Digital cancellation is implemented to cancel

self-interference and provides about 27 dB cancellation.

The experiment is conducted in an office environment with 5 USRPs as shown in Fig- ure 2.3(a). Although all of the nodes are in one collision domain (they can hear each other’s

27 transmission), the different distances between nodes and the metallic tables help to create

exposed scenarios (link N4→N5 and N2→N1) and secondary transmission opportunities

4 (link N3→N4 and N4→N5). Because of the large latency between the host computer and

USRP, we use small sampling rate (250 KHz in our experiment) and large packet length

(3000 Bytes). GMSK with 4 samples per symbol is used as the modulation scheme, which results in a data rate of 62.5 Kbps. However, due to the control latency and backoff over- head, only 30.5 Kbps for FDNative is achieved when there is only one pair of transmitter and receiver as shown in Figure 2.3(b). RCTC behaves slightly worse with a throughput of

29.35 Kbps because of the signature overhead.

2.4.2 Exposed Transmission

Figure 2.3(c) shows the throughput of two exposed links N4→N5 and N2→N1. The aggregate throughput of RCTC is 59.1% higher than that of FDNative, which in theory should be 100% higher. There are two reasons behind this result. First, the control signa- tures increase the overhead of RCTC as explained in the previous section. Second, these two exposed links also have opportunities to transmit simultaneously and successfully in

FDNative if they select the same backoff value. The latter contributes the most because the sum throughput for FDNative is 37 Kbps, which is 21% higher than the throughput when only link N3→N4 is enabled. Since CF only enables secondary transmission, its

performance is the same as FDNative.

4Note that the rule for exposed transmission (Equation 2.1) only requires that the interference at the receiver can be tolerated. So the exposed terminal and primary receiver can be in one collision domain. However, we do need to enable capture effect [98], which allows a receiver to re-lock to a stronger signal while it is receiving a weaker one. So the receivers in our experiment could re-lock to the packet from its own transmitter (stronger packet) even if it is already locked to the packet from exposed terminals (weaker packet).

28 2.4.3 Secondary Transmission

Figure 2.3(d) shows the throughput of link N3→N4 and N4→N5. In this scenario,

N3→N4→N5 naturally forms a secondary transmission chain. In addition, when link

N4→N5 wins the contention, link N3→N4 behaves as an exposed link since its trans- mission does not hurt the reception of node N5. So both of the links could transmit con- currently all the time. Compared with FDNative, RCTC achieves 78% higher aggregate throughput. CF, on the other hand, only allows link N4→N5 to always transmit, resulting in 41% throughput gain.

2.5 Evaluations

To study the performance of RCTC in larger networks, we present ns-3 based simula- tion results in this section. The following algorithms are evaluated in this section:

• Half-duplex: This represents the Distributed Coordination Function (DCF) mode in

the IEEE 802.11 protocol without RTS-CTS control packets.

• CMAP: The proposed scheme in [97] which deals with exposed terminals in half

duplex.

• FDNative: As discussed in Section 2.4.

• CF [89]: As discussed in Section 2.4.

• RCTC: Our proposed solution.

Throughput and fairness are the two metrics used to evaluate the performance of different schemes. The Jain’s fairness index [50] is used to calculate the throughput fairness of all of the flows created. We present results from AP based networks and ad hoc networks.

29 1 200 RCTC 0.9 CF FDNative 0.8 CMAP 150 0.7 Half−duplex

0.6

0.5 100

Jain’s fairness Index 0.4

Aggregate Throughtput (Mbps) 0.3

50 0.2 0.5 1 1.5 2 0.5 1 1.5 2 Number of flows per client Number of flows per client (a) Aggregate throughput (b) Jain’s fairness index

Figure 2.4: Aggregate throughput and Jain’s fairness index for different number of flows per client.

The data rate is fixed to 12 Mbps and the packet size is 1500 Bytes. We pick 5 dB as the threshold for ∆d (Equation 2.1) because it provides close to 100% packet success ratio with the default error model in ns-3. All flows created are saturated UDP flows.

2.5.1 AP Network

The APs are uniformly randomly distributed in an 800×800m2 area. First, the area is divided into small sections, one for each AP. Then each AP is randomly placed within each section. Several clients are randomly placed around each AP. Finally, we randomly create a number of possible flows (uplink and downlink). The default setting is 30 APs with 3 clients per AP and 0.5 flow per client. For each setting, the simulation result is averaged over 20 randomly generated scenarios.

30 Different Number of Flows

Figure 2.4(a) plots the average aggregate throughput with varying number of flows.

The standard deviation is plotted as the vertical bar around the average value. Since re- turn transmission has higher priority than secondary transmission in RCTC, the gain from exposed transmissions decreases when the number of flows increases because more bi- directional transmissions occur. Note that 2 flows per client indicates that all of the flows are bi-directional. However, RCTC could still achieve 12% higher throughput than FD-

Native because exposed transmissions are still allowed if the primary transmitter has a small Plost as discussed in Section 2.3.5. At 0.75 flows per client, the average throughput gain of RCTC over FDNative reaches the highest point at 79.4%. The fairness of all the full duplex schemes are comparable with RCTC performing the best, indicating that the throughput gain of RCTC does not come with the cost of fairness. CMAP experiences low fairness and even lower throughput than half duplex with more than 1.25 flows per client. We believe this is related to the windowed ACK and backoff scheme in CMAP. In

CMAP, when a node has a total of 8 unacknowledged packets, it keeps quiet for a random duration between 40 to 80 ms. In the experiment, we found that the total occurrences of this waiting increases from 1000 to 7000 as the number of flows changes from 0.5 (peek throughput point) to 2 (lowest throughput point) per client during a 2 seconds simulation time. The high number of ACK timeouts originates from the fact that CMAP uses pair to pair relationship to determine exposed links and is unable to deal with two or more exposed links.

31 250 0.8

0.7 200

0.6 150 RCTC−1 0.5 RCTC, α=0.1

100 RCTC, α=0.5 Jain’s Fairness Index 0.4 α

Aggregate Throughput (Mbps) RCTC, =1

50 0.5 1 1.5 2 0.5 1 1.5 2 Number of flows per client Number of flows per client (a) Aggregate throughput (b) Jain’s fairness index

Figure 2.5: Aggregate throughput and Jain’s fairness index with different value of α introduced in Section 2.3.5.

Influence of Design Parameter

We keep the same simulation setting as the previous section. Then we change the

parameter α discussed in Section 2.3.5. The results are shown in Figure 2.5. The scheme

“RCTC-1” refers to a variant of RCTC where exposed terminals are not allowed to transmit

if they do not receive the return signature from the primary receiver. With α = 1, the

transmitter suppressing is turned off and the exposed terminals are free to transmit in the

absence of return signature. This scheme achieves higher throughput, however, at the cost

of a sharp decrease in fairness. We pick α = 0.1 as the default parameter in our design

because it balances the throughput and fairness.

32 RCTC 250 0.8 CF FDNative 200 0.7 CMAP Half−duplex

150 0.6

100 0.5 Jain’s fairness Index 50 0.4 Aggregate Throughtput (Mbps)

0 10 20 30 40 50 60 10 20 30 40 50 60 Number of Aps Number of Aps (a) Aggregate throughput (b) Jain’s fairness index

Figure 2.6: Aggregate throughput and Jain’s fairness index for different number of APs

Different Number of APs

Figure 2.6 shows the result with different number of APs. All the schemes show an increase in the aggregate throughput as the density increases. The throughput gain of RCTC increases from 1.41× to 2.31× the throughput of FDNative as higher density has more exposed transmission opportunities. CF only performs 5% to 9% better than FDNative.

And RCTC achieves 35% to 111% higher performance than CF. Full duplex is supposed to double the throughput of half duplex. In this experiment, RCTC shows an average of 1.53× to 2.86× the throughput of Half-duplex and 1.43× to 1.67× the throughput of

CMAP. Figure 2.6(b) shows the fairness index for all flows in the network. The results of all full duplex schemes are comparable. Although CMAP achieves higher throughput with increasing number of APs, the fairness is much lower than other schemes.

33 200 200 200 Primary Exposed 150 150 150 Secondary

100 100 100

50 50 50

Aggregate Throughput (Mbps) 0 Aggregate Throughput (Mbps) 0 Aggregate Throughput (Mbps) 0 RCTC CF FDNative RCTC CF FDNative RCTC CF FDNative (a) 30 APs (b) 40 APs (c) 50 APs

Figure 2.7: Throughput of primary, exposed and secondary transmissions with different number of APs.

Analyzing the Throughput Gains

To understand the gains obtained by RCTC, we investigate the contribution of primary, exposed and secondary transmissions in Figure 2.7 with varying number of APs. In all of the results, the throughput of primary transmissions of RCTC decreases compared with

FDNative due to the increase in contention resulting from exposed terminal transmissions.

Observe that with increasing node density (more APs), the opportunities for exposed termi- nal transmissions are increasing in RCTC. The low throughput contribution from secondary transmission also confirms our argument that the opportunities for secondary transmission in AP network are few.

Different Ratio of Downlink Traffic

In AP networks, there are two types of traffic. Uplink traffic is from the clients to the AP, and downlink traffic is from AP to clients. AP networks are generally considered as downlink heavy. Figure 2.8 plots the results with different downlink traffic ratio. As shown in the figure, RCTC outperforms FDNative in case of fairness all of the time. The

34 RCTC 180 CF 0.8 FDNative CMAP 160 Half−duplex 0.7

140 0.6 120 0.5 100 Jain’s fairness Index 0.4 80 Aggregate Throughtput (Mbps)

60 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Ratio of downlink flows Ratio of downlink flows (a) Aggregate throughput (b) Jain’s fairness index

Figure 2.8: Aggregate throughput and Jain’s fairness index of different ratio of downlink traffic.

throughput gain varies from 71.2% to 90.9%, indicating that the performance of RCTC is robust to different traffic patterns.

2.5.2 Ad Hoc Network

Nodes are randomly placed in an 800×800m2 area. One-hop flows are chosen ran- domly on links that have SNR higher than 10 dB. Figure 2.9(a) presents the aggregate throughput with varying number of nodes in the network. The throughput gain of RCTC over FDNative increases from 9.1% to 54.2% while the throughput gain of CF changes from 1% to 14%.

Figure 2.9(b) shows the performance for varying number of flows in the network. The total number of nodes is 60. As shown in Figure 2.9(b), as the number of flows per link changes, the throughput gain of RCTC varies from 0.6% to 28.9%.

35 160 160

140 140

120 120 RCTC 100 CF 100 FDNative 80 CMAP 80 Half−duplex 60

40 60 Aggregate Throughtput (Mbps) Aggregate Throughtput (Mbps)

20 40 50 100 150 0.5 1 1.5 2 2.5 Number of Nodes Number of flows per Node (a) Different number of nodes (b) Different number of flows

Figure 2.9: Aggregate throughput for different number of nodes and flows per node in an ad hoc network.

2.6 Discussion

Our design and evaluation of RCTC assumes that all packets in the queue are of the same size and all nodes are using the same data transmission rate. The reason behind it is that the transmissions should be all aligned as shown in Figure 2.2 to protect the

ACKs. This assumption raises questions about the performance of RCTC in practice. In this section, we briefly discuss the impact of varying packet size, multiple data rates and bursty traffic, and outline a potential approach to address these issues.

• Varying packet size: In a real deployment, packets may not have the same size. The

ACK packets in TCP traffic are significantly smaller than the data packets: data

packets are as long as 1500 bytes and TCP ACKs are only 40 bytes. However, we

can use a two-pronged approach to address such scenarios. First, we can perform

packet aggregation like in 802.11n and packet splitting. This allows us to create

36 virtual packets that are of the same length. Second, busy tone signals can be used

as padding bits. Finally, the exposed terminals examine the length of the primary

transmission and refrain from sending bigger packets.

• Multiple data rates: Our solution could be easily extended to multiple data rates

by varying the parameters ∆d and Cd used in Equation 2.5 for different values of

data rate d. Packets of the same length require different durations to transmit using

different data rates. However, the solutions for varying packet length could be applied

to address this issue.

• Bursty traffic: The traffic flows in our evaluation of RCTC are saturated UDP flows.

In reality, flows tend to be bursty. However, this could be taken care of through tuning

the design parameters tw and α as discussed in Section 2.3.5. Smaller tw allows the

primary transmitter to quickly stop suppressing exposed terminals when the return

flow finishes while larger tw makes suppressing more robust to fluctuating traffic.

2.7 Related Work

Several categories of schemes have been proposed to solve the hidden terminal and the exposed terminal problems.

Busy Tone: In Busy Tone Multiple Access (BTMA) [93], the receiver sends out a busy tone signal using a secondary wireless channel. All of the nodes in the surrounding area of the receiver will hear this busy tone and be prevented from transmission. A drawback is that a second channel and the corresponding guardband create a significant overhead on spectrum.

37 Request-to-Send/Clear-to-Send (RTS/CTS) [21,51]: This scheme is proposed to solve the hidden terminal problem and could be extended to the exposed terminal problem. Ex- posed nodes can send upon hearing the RTS while not receiving the corresponding CTS.

To align exposed transmissions, a control time gap is inserted between RTS/CTS and

DATA/ACK in [8]. In [26], exposed terminals use the SINR in the CTS message to esti- mate whether it can start a concurrent transmission or not. To protect the RTS packets from collision, the authors of DBTMA [43] used a second busy tone, which increases the spec- trum overhead. In [64], a modification is proposed in which a node desiring to share the channel transmits a Request-To-Send-Simultaneously (RTSS) packet. When an exposed link gets access to the channel, it returns a Clear-To-Send-Simultaneously (CTSS) packet that allows the simultaneous transmission. The RTS/CTS messages are transmitted at the lowest data rate, which creates a significant time overhead. FlashLinQ [102] uses multi- ple rounds of RTS/CTS OFDM slots to schedule concurrent transmissions, which requires tight synchronization.

Interference Mapping: E-CSMA [33] builds a channel state map using the received sig- nal strength indicator (RSSI) at the transmitter and the transmission success ratio reported by the receiver. Because the observed RSSI is maintained at the transmitter instead of the receiver, there is no real-time knowledge of the interference at the receiver. In CMAP [97], the authors proposed a scheme that uses an online conflict map. Two links that cannot transmit simultaneously form an entry in the conflict map. Potential transmitters monitor ongoing transmissions and check the conflict map to make transmission decision. However, the time required for creating and updating the map at each node grows exponentially with the number of links, and the ability for nodes to monitor multiple overlapping concurrent

38 transmissions is actually impossible. In [86], the authors developed two empirical mod-

els based on the relationship between received signal strength, transmit power, SNR, and

packet reception ratio. When a potential transmitter overhears a packet, it uses the models

to check if it can transmit and select the appropriate transmission power level. An RTS/CTS

scheme is used between the exposed node and its receiver. However, the exposed node may

not be able to receive the CTS message due to the interference from the ongoing primary

transmission, which causes a significant reduction in realization of exposed transmission

opportunities.

Full Duplex Mac: In [82], the transmitters initially send a half duplex packet. Subse- quent transmissions between the same transmitter and receiver are full duplex. A scheme called shared random backoff is used to align the full duplex transmissions in time. The authors assume that a node cannot start a transmission while it is in the receiving state, due to the limited coherence time of self interference channel state estimation. In [89], the re- ceiver of a primary transmission can send a packet to a secondary receiver when it does not have return transmission to the transmitter. It also uses a busy tone signal to fill the time gap between the transmission of the transmitter and the receiver to prevent neighboring transmissions. However, it does not take advantage of exposed terminals.

2.8 Conclusion

We have taken the first step in uncovering the ability for full duplex systems to miti- gate several practical issues and improve its throughput performance. We believe that the primitives such as the signatures, together with the ability to always overhear, can enable a rethinking of MAC strategies. In this chapter, we presented RCTC, a scheme that makes

39 use of signatures to allow fast handshaking for coordinating transmissions in the neighbor- hood, and allow exposed and secondary transmissions. Compared with native full duplex

MAC, our prototype shows as high as 78% throughput gain and evaluation results with larger network show up to 131% gain in throughput. Besides, RCTC also achieves better fairness performance.

40 Chapter 3: DOMINO: Relative Scheduling in Enterprise Wireless LANs

3.1 Introduction

Owing to the increasing number of WiFi-capable devices, enterprise WiFi networks are becoming more prevalent in office environments, campuses, airports and malls. In addition, a number of cellular providers are deploying enterprise WiFi networks in congested areas at scale to offload the cellular traffic and thereby increase the capacity of the increasingly con- gested cellular networks [5]. This centralized structure has been leveraged for developing efficient solutions to various challenging problems such as channel assignment [23,58,67], client association [19,23,67], and power management [23,63]. Another important problem in enterprise networks is channel access, which directly impacts critical parameters such as throughput and delay. Existing work in channel access can be broadly classified as follows:

• Distributed Schemes: Distributed Coordination Function (DCF) [1] defined in the

IEEE 802.11 standard is the most widely used distributed channel access technique.

Wireless devices pick a random back-off time and access the channel when the back-

off timer expires. DCF has several advantages such as low implementation complex-

ity, high scalability and robustness to failures. However, as each WiFi device makes

distributed channel access decisions based on local carrier sensing, it is well known

41 that DCF suffers from hidden and exposed terminal problems. Prior works [64,88,97]

have shown that both of these problems are prevalent in real deployments. Some

sub-optimum schemes [13,31,73] and throughput optimum schemes [80] have been

proposed. However, they either achieve asymptotic optimality or suffer from high-

overhead and requirement of stringent time synchronization.

• Centralized scheduling schemes (strict scheduling): Centralized schemes are based

on a central controller [18] that collects interference relationship among the nodes in

the network and they make channel access decisions based on the status of all the

queues in the system and the channel conditions. Although centralized schemes can

achieve better performance than distributed schemes, there are two practical limi-

tations. First, the central controller does not have information on the state of the

queues at the clients. Thus, it assumes that the client always has data to transmit or

it estimates the traffic load at the clients. As discussed later in Section 3.2, a naive

solution based on piggybacking queue information, has a starvation problem. Sec-

ond, the central scheduler assumes that all nodes can follow the schedules strictly.

In practice, jitter over the wired network limits the achievable time synchronization

accuracy between the APs.

• Hybrid schemes: Hybrid solutions like CENTAUR [88] and OmniVoice [10] sched-

ule the downlink traffic from the AP to the clients to avoid hidden terminals and

utilize exposed terminal opportunities. The uplink traffic still uses DCF to access the

channel. The uplink traffic suffers from the same problems as discussed for the case

of Distributed Schemes. Moreover, the disturbance created by uplink traffic to the

downlink schedule can significantly diminish the performance gains [88].

42 C2 C1 C3

AP 2 AP 1 AP 3

Figure 3.1: A network with three AP-client pairs. Dashed lines between nodes indicate that the nodes can hear each other. Solid arrows denote flow directions.

18 16 DCF CENTAUR 14 DOMINO 12 Omniscient 10 8 6 4 Throughput (Mbps) 2 0 AP1->C1 C2->AP2 AP3->C3 Overall

Figure 3.2: The throughput on different links. The overall throughput of omniscient scheme is 76% higher than DCF and 61% higher than CENTAUR. DOMINO performs close to the omniscient scheme.

Although centralized schemes are difficult to implement in practice due to the reasons described above, they have the potential for very high throughput performance. We use the network shown in Figure 3.1 as an example. There are three AP-client pairs and three flows.

AP1 is a hidden terminal to AP3 while C2 and AP1 are exposed terminals to each other.

Therefore, DCF performs poorly on this network. Because of the hidden terminal problem, the link AP3→C3 achieves little throughput if all of the transmitters are backlogged. On the other hand, CENTAUR, a hybrid solution, schedules the AP1→C1 and AP3→C3 links

43 to avoid the hidden terminal problem, but it is unable to schedule uplink traffic in the

same slot, missing out on the opportunity for the exposed transmission C2→AP2. In an

omniscient centralized scheduling scheme, the link C2→AP2 can always access the channel

while the links AP1→C1 and AP3→C3 occupy the channel alternately. Figure 3.2 presents the throughput of different links using different scheduling algorithms.

A mechanism that can enable centralized schemes to work in practical networks can reap the benefits shown above. In this chapter, we present a framework for channel ac- cess in enterprise WLANs, called DOMINO, that can achieve the optimality of centralized schemes without depending on time-synchronization among APs. Toward meeting this ob-

jective, we propose Relative Scheduling. In sharp contrast to strict scheduling, Relative

Scheduling uses wireless triggers which are created by a set of PN-sequences transmit-

ted by the sender and receiver at the end of a data packet transmission. The transmission

events are triggered by previous events akin to a domino effect. In any given slot, a care-

fully chosen set of transmissions trigger the transmissions in the next slot. Triggering

using wireless transmissions is a new concept which is a complete paradigm shift from

time synchronization-based protocol designs. Besides, we extend the technique used in

PAMAC [81] and introduce Rapid OFDM Polling (ROP), in which the clients use differ-

ent OFDM subchannels to send their queue backlog information after receiving a polling

request from the AP.

The contributions in this chapter are summarized as follows:

• Relative Scheduling: This new paradigm of triggering wireless transmissions by

other wireless transmissions has the following features: 1) it is able to work with any

arbitrary centralized scheduling algorithm in a real network; 2) it does not require

44 tight time-synchronization; 3) clients do not need to know the schedule in advance;

4) multiple triggers are used to increase robustness.

• Experiments and Extensive Trace-driven Simulations: Experimental results from

our USRP testbed are used to derive simulation parameters. We use our USRP

testbed to verify the ability of DOMINO to achieve better throughput. The simu-

lation results show that DOMINO achieves up to 96% higher throughput compared

to DCF.

The remaining part of this chapter is organized as follows. Section 3.2 presents the motivation for relative scheduling in wireless networks. The design details of DOMINO are described in Section 3.3. We implement DOMINO in both a USRP testbed and ns3 and illustrate the evaluation results in Section 3.4. Section 3.5 presents the discussions followed by the related work and conclusion sections.

3.2 Motivation

The basic assumption of strict scheduling scheme is that nodes in the network behave

exactly according to the time schedule. However, this assumption does not hold without

microsecond-level time synchronization. Several time synchronization schemes have been

proposed for wired and wireless networks. In wired networks, existing work shows that the

Network Time Protocol (NTP) [3] can achieve a time accuracy of about 1000 µs in a quiet

Ethernet network. Given that a WiFi slot time is only 9 µs, this coarse synchronization

is intolerable. The Precision Time Protocol [2] is designed to achieve microsecond level

accuracy. However, it requires specialized and expensive hardware. In wireless domain,

a recent work [76] achieves nanosecond level synchronization, which is enough for strict

scheduling scheme. However, the synchronization has to be done for every transmission

45 and is limited to a single collision domain. To the best of our knowledge, the most accurate

time synchronization scheme over multiple collision domains is proposed only in RBS [34].

It uses reference broadcast on the wireless channel to achieve microsecond level synchro-

nization. Although it achieves around 10µs accuracy in a 4-hop network, the performance decreases as the number of hops and the number of nodes in one hop increases, making it unsuitable for large and dense networks. In addition, since clock skew is influenced by the environment (e.g., temperature and supply voltage), this synchronization scheme has to be frequently executed to update the time. Extra hardware such as cellular networks [14] or

GPS [71] can also be used to realize synchronization, which however increases the system cost and complexity. In addition, GPS is not accurate in indoor environments. Besides pre- cise time synchronization, the strict scheduling scheme requires collecting the queue status of all the nodes, and distributing the schedule to the clients.

Instead of using an absolute time tag with the scheduling decision, DOMINO uses a relative transmission tag. It notifies the next transmitter to start at the end of the current transmission. Decoding the ongoing packet and estimating the stop point provides a naive way to implement the relative tag. However, hidden and exposed transmissions present challenges in decoding of the tag. To address this issue, we use physical layer signatures to enable our relative triggering scheme.

3.3 DOMINO Design

This section presents the key components of DOMINO: Rapid OFDM Polling, Rela- tive Scheduling and the mechanism for converting any schedule produced by an arbitrary scheduler to a schedule suitable for DOMINO. First, we discuss the method to identify hidden and exposed links.

46 guard subcarriers

guard band subchannel 0 subchannel 11 subchannel 12 subchannel 23 guard band DC

… … … … … … … … -128 -109 -100 -9 -4 -3 -2-1 0 1 2 3 4 9 100 109 127

Figure 3.3: The construction of one OFDM symbol

Identifying hidden and exposed links: The central server requires the interference

information between different links to calculate the schedule. Exposed links can transmit

simultaneously while hidden links should be scheduled in different slots. In DOMINO, a

central interference map consisting of the received signal strength between all node pairs,

is maintained at the server. This map can be utilized to calculate the interference between

different links and create a conflict graph G(V, E) [52,79]. The method used in [52] re- quires O(N) steps to build the map, where N is the number of nodes in the network. Each node in V stands for a link (AP-client or client-AP pair). An edge in E indicates that those two links interfere with each other according to the interference map and should not transmit together. Thus, all links that form an independent set in G(V, E) can transmit simultaneously.

3.3.1 ROP: Rapid OFDM Polling

The intervening wired network makes it challenging for a central controller to maintain up-to-date queue status of the clients. One indirect solution is to use the AP to which the client is associated with as a relay node and piggyback the queue status in packets sent to

AP [106]. The disadvantage is that if the client stays silent for a period and then enqueues a new packet, the AP does not get to know about this new packet and the client may get

47 Parameter WiFi ROP number of subcarries 64 256 subcarriers per subchannel – 6 guard subcarriers – 3 number of subchannels – 24 CP duration 0.8 µs 3.2 µs symbol duration 4 µs 16 µs

Table 3.1: Parameters used for the OFDM symbol to convey the queue length of clients

Subchannel FFT window

CP Client N N 1 slot … Polling CP … 3 Packet CP Client 3 2 From CP Client 2 1 AP CP Client 1 0 CP Client 0 Time

Figure 3.4: The process of obtaining queue status from clients.

starved. By taking advantage of Orthogonal frequency-division multiplexing (OFDM),

PAMAC [81] obtains the queue status at all the clients through one polling action. However,

PAMAC only obtains a coarse status of the queues, which is not sufficient for the central controller to compute the schedule. In addition, it does not consider the difference in the received signal strength from different clients. We introduce ROP in DOMINO to address these issues. Back2F [84] also has a similar scheme in using different OFDM subcarriers to convey channel contention information from different devices. However, Back2F studied the affect of self-interference (one subcarrier) on subcarrier detection, while we study the interference between subchannels (multiple subcarriers) from different clients. Besides,

48 0.02 Central frequency Subchannel 1 Subchannel 1 Subchannel 1 10 10 0 Subchannel 2 0 Subchannel 2 0.015 −10 −10 Subchannel 2 −20 −20 0.01 −30 −30 −40 −40 −50 −50 0.005 Received signal strength (dB) −60 Received signal strength (dB) −60 Received signal strength 4 9 14 19 24 29 4 9 14 19 24 29 Subcarriers Subcarriers 0 −15 −10 −5 0 5 10 15 20 25 30 Subcarriers (b) The decoded OFDM (c) The decoded OFDM (a) The decoded OFDM samples at the AP with 2 samples at the AP with 2 samples at the AP with 2 clients using adjacent clients using adjacent clients using adjacent subchannels without any guard subchannels with guard subchannels without any guard interval. There is a 30 dB interval. There is a 30 dB interval. The received signal difference in the received difference in the received strength from both clients are signal strength from the 2 signal strength from the 2 similar. clients. clients.

Figure 3.5: Received OFDM samples from two clients. The system can tolerate 30 dB signal strength difference when using three subcarriers as guard interval.

we design a complete AP-client polling system in ROP while Back2F focused on channel

contention.

OFDM is a modulation scheme widely used in wireless communication. Instead of

transmitting over a wideband, the system can be viewed as transmitting slowly on multiple

independent narrow-band channels, called subcarriers. To confront multipath fading, a

cyclic prefix (CP) is attached to each symbol. In 802.11 a/g, the 20MHz is

divided into 64 subcarriers, of which 48 are used to transmit useful data. One OFDM

symbol takes 4 µs in 802.11 while the CP duration is 0.8 µs.

To obtain the queue status in DOMINO, the channel is separated into several subchan-

nels, each consisting of several subcarriers. When a node is associated with an AP, a unique

subchannel is assigned to it. In practice, there are several problems that affect this OFDM

system. First, the frequency offset between clients and APs breaks the orthogonality be- tween different subcarriers and causes inter-subcarrier interference. Second, the clients

49 100

0 1 50 2 3 4

0 Correct decoding ratio (%) 15 20 25 30 35 40 Difference in RSS (dB)

Figure 3.6: The relationship between number of guard subcarriers and the difference in RSS

have to be synchronized and need to send at the same time. The return OFDM symbol from different clients should overlap at the AP for at least a duration of one FFT window.

Third, since the analog-to-digital converter (ADC) at the AP has limited resolution, it can get saturated by the stronger signal. So the difference between the received signal strength from different clients also affects the final decoding result.

Figure 3.4 showsthe process of how DOMINO obtains the queue status from the clients.

First, the AP broadcasts a polling packet. This packet contains a preamble that each client can use to tune the frequency offset. It also behaves as a reference broadcast to synchronize the clients. Then, after receiving this packet, each client waits for one standard slot time in

WiFi (9 µs) and then transmits its queue size using the assigned subchannel. Because the distance between an AP and its clients varies, the signals from different clients reach the

AP at slightly different times. However, the AP is still able to find a suitable FFT window using a large enough CP duration.

Instead of using the default values in WiFi, we use a different set of values for this special control OFDM symbol as shown in Table 3.1. Considering the maximum WiFi communication range to be 300 meters, the longest turnaround propagation delay is 2 µs.

50 To account for this delay, the CP duration is chosen to be 3.2 µs. Each subchannel contains

6 subcarriers, to encode a maximum queue size of 63(= 26 −1) if binary phase-shift keying

(BPSK) is used as the modulation scheme. When there are more than 64 packets in the queue, we can first report 63 packets and keep track of the number of unreported packets.

Moreover, to prevent interference between different subchannels, several subcarriers are used to create a guard interval. Our experiments at the end of this section show that using

3 subcarriers as the guard interval is enough to tolerate a mismatch of up to 38 dB in the received signal strength from different clients. A total of 24 subchannels are available for the AP to assign. Because of DC offset, the center subcarrier is not used. The remaining 39 subcarriers are used as guard band between different wireless channels as in 802.11 (11 out of 64 subcarriers are guard band). Figure 3.3 shows the details of how the control OFDM subcarriers are assigned.

To study the performance of ROP, we implement the mechanism shown in Figure 3.4 on GNURadio [4] and test it using a USRP testbed. Because only one OFDM symbol is sent back to the AP, it is difficult to estimate the phase offset. So, 2-amplitude-shift keying (2ASK) is used for modulation instead of BPSK since phase offset does not affect the amplitude of the samples.

Figure 3.5(a) plots the result of two clients sending on adjacent subchannels with sim- ilar received signal strength (RSS). The bits sent on subchannel 1 are 111111, while the bits on subchannel 2 are 011111 where the first bit is set to 0 to show the interference be- tween different subchannels. This figure shows that all the bits on both the subchannels are received correctly. When there is 30 dB difference in the RSS from the two subchannels, the decoded samples are plotted in Figure 3.5(b). The sequence 111111 is sent on both the subchannels. However, the first three subcarriers of subchannel 2 are affected by the strong

51 signal on subchannel 1. Figure 3.5(c) shows the result after separating them by three sub-

carriers. It reveals that that separation helps in reducing the interference between different

subchannels. Figure 3.6 shows the relationship between difference in RSS and the number

of guard subcarriers. A separation of three subcarriers is shown to be sufficient as long as

the RSS difference is no more than 38 dB. We measured the RSS between different nodes

in a testbed with 40 wireless nodes. Only 0.54% of all link pairs have an RSS difference

greater than 38 dB, which indicates that separation of 3 subcarriers is sufficient for most

cases. In the extreme case where the RSS difference from two clients is indeed higher than

38 dB, the AP should assign them non-adjacent subchannels to relieve the interference.

Another interesting question is – how low the signal-to-noise ratio (SNR) can be for re- liable detection?. Experiment results reveal that as long as the SNR is higher than 4 dB, an OFDM symbol can be decoded correctly. Studies show that the SNR should be at least

4 dB to allow reliable WiFi transmission with the lowest data rate of 6 Mbps [72]. This set of experiments proves that the design of using one OFDM symbol to extract the queue information from the clients is effective.

3.3.2 Relative Scheduling

The schedule calculated by the central server requires microsecond level synchroniza- tion between the APs. Otherwise, the slots will overlap, leading to throughput degrad- ing. DOMINO takes advantage of the broadcast nature of wireless communication and introduces a novel concept called relative scheduling, which removes the need for tight synchronization.

To show the basic idea of relative scheduling, we use a network with 4 AP-client pairs shown in Figure 3.7 as an example. Let’s consider downlink traffic only. Figure 3.7(c)

52 presents one possible schedule according to the conflict graph in Figure 3.7(b). In relative

scheduling, this scheduling decision is turned into two chains:

Chain1 : AP1→C1, AP2→C2, AP1→C1, AP2→C2

Chain2 : AP4→C4, AP3→C3, AP4→C4, AP3→C3

The transmitters in the same chain transmit in sequence. The central controller does not need to give the exact time when a link should be active. Instead, it informs that a link should be relatively activated after another one.

In Figure 3.7, the signals from AP2 and AP3 collide at AP1, making it impossible for

AP1 to correctly decode the packet. Thus, AP1 is unable to detect which link is currently

active in the relative chain. To trigger the transmission of the next packet, DOMINO utilizes

node signatures, a concept which has been used in several recent works [62,83,105] for

various types of control signaling. At the end of each transmission, the signature of the

next transmitter is explicitly appended. Instead of trying to decode the ongoing packet

and listening for the end of this transmission, each node keeps on running a correlator for

its own signature and starts transmitting once the signature is detected. Signatures can be

decoded under high interference, which makes it possible for AP1 to receive the notification

from AP2, even when the packet cannot be decoded correctly. Here, we assume that only

the transmitter that has the maximum RSS at the next transmitter is in charge of sending

the signature. If the is changed with AP4 and C4 removed from Figure

3.7(a), AP1 has to inform both AP2 and AP3 to start sending at the end of its transmission.

In that case, AP1 sends the sum of AP2 and AP3’s signatures. Because different signatures

are orthogonal to each other, they are still detectable.

Gold codes [37], because of their outstanding cross correlation property, are used for

node signatures. Longer Gold codes have higher difference between self-correlation and

53 AP 2 C1

C2 AP 1 AP 3 C4

C3 AP 4

(a) Network topology (Dashed line indicates that the two nodes interfere with each other, while solid line denotes AP-client association)

AP 1 C1 AP 2 C2

AP 3 C3 AP 4 C4

(b) Conflict graph of the downlink traffic

AP 1 C1 AP 2 C2 AP 1 C1 AP 2 C2

AP 4 C4 AP 3 C3 AP 4 C4 AP 3 C3 Slot 1 Slot 2 Slot 3 Slot 4 (c) One possible schedule of the downlink traffic

Figure 3.7: A network with four AP-client pairs.

54 cross-correlation values, thus increasing the robustness and providing more signatures, al-

beit with higher overhead. In DOMINO, we use a set of 129 Gold codes with length 127.

With 20 MHz bandwidth and BPSK as the modulation scheme, it takes 6.35 µs to transmit one signature. As will be mentioned later, two codes are reserved for special use. Thus, the system can support up to 127 nodes in one collision domain. Note that the signatures can be reused across different collision domains. Since there is a central controller in the system, we assume that every wireless node will be assigned a unique signature when it joins the network.

In the above discussions, it is assumed that the next transmitter receives the signature from the previous transmitter. However, only the AP obtains the schedule from the central controller. So if a client is the current transmitter, it will not know the next transmitter.

Another issue to note is that notifying a hidden terminal (AP3 and AP4 in Figure 3.7) to start transmission is not easy. To solve these problems, both the transmitter and receiver send out the signatures of its surrounding nodes at the same time. The mechanism used by the AP to inform the client the signatures to send is divided into two cases as shown in Figure 3.8.

When the AP is the transmitter (Figure 3.8(a)), it transmits the samples of the signature that the client should send (S1) at the end of the packet. When the AP is the receiver

(Figure 3.8(b)), it transmits S1 at the end of the ACK. In either case, the client stores those samples. One slot after the ACK, both the AP and clients transmit the signatures together.

The signatures sent by the AP and client should be different because they only need to inform their own surrounding next transmitters. The one with stronger RSS is responsible for sending the signatures of the nodes in the common area. To distinguish the signatures sent from the AP to the client and the final signatures broadcast to the next transmitter, a special signature, the START signature (S′) is sent after the latter one. So, the next

55 AP: Data Packet S1 S2 S′ SIFS 1 slot

Client: ACK S1 S′ (a) AP → Client

AP: ACK S1 S2 S′ SIFS 1 slot

Client: Data Packet S1 S′ (b) Client → AP

Figure 3.8: The timeline of the transmission between AP and client. S1 is the signature ′ that should be sent by the client, S2 is the signature that should be sent by the AP, and S is the START signature.

transmitters can start transmitting only if both its own signature and the START signature are detected in sequence.

In practice, a packet received incorrectly prevents the sending of signatures. Also some- times signature detection fails. If any of these happens, the chain is broken and all of the following transmissions will fail. To make the system robust, we add cross-links between different chains so that one transmitter can get the notification from multiple nodes. Thus, backup notifications are created and can be used if the normal notification is ineffective.

However, it is possible that the signatures from different triggering transmitters collide at the next transmitter.

Using USRPs, we studied how many signatures can be added together and yet received correctly even in presence of interference. Five different experiment setups are evaluated.

In the first setup, there is only one transmitter and one receiver. In the second and third setups, there are two transmitters with similar RSS at the receiver. Note that DOMINO

56 1000

800

600 1 sender 400 2 senders, same signatures 2 senders, different signatures 3 senders, same signatures Detection Ratio (%) 200 3 senders, different signatures 0 1 2 3 4 5 6 7 Number of combined signatures

Figure 3.9: Detection ratio of multiple signatures

chooses the top two nodes with the highest RSS as the triggering nodes. So, our choice of picking two transmitters with similar RSS is the worst case as they result in the highest interference to each other. Both of the transmitters send the same signatures in the second setup. We used different signatures in the third setup to study the interference of non- correlated signals. The fourth and fifth setups are similar to the second and third one except that there are three senders. Figure 3.9 plots the result of signature detection ratio from 1000 runs. The signature detection ratio is nearly 100% in all experiments when the total number of combined signatures is no more than 4 and the false positive ratio is below

1% all the time (not shown in the figure). DOMINO uses 4 as the maximum number of signatures to combine.

57 Slots Batch 0 Batch 10 Batch 11 Links 0 1 2 3 90 91 92 93 94

AP 1 AP 1 fake (3) AP 1 C1 C1 AP 1

AP 2 AP 2 AP 2 C2 (2) C2 AP 2

AP 3 AP 3 AP 3 C3 C3 AP 3 (1)

AP 4 AP 4 AP 4 C4 C4 AP 4 24 s

ߤ (a) (b)

Figure 3.10: The timeline of the network in Figure 3.7. The link APi → APi stands for polling operation. The arrow between different links indicates the triggers between different slots.

3.3.3 Schedule Converter

The third component of the central server is the converter, which is a series of proce- dures that convert a strict schedule made by an arbitrary scheduler to a relative schedule.

Before introducing the converter, we define a few terms. The sender of a link l is denoted

by l.sender and the receiver by l.receiver. Link l is either an uplink or a downlink. Thus,

either l.sender or l.receiver must be an AP, which is denoted by l.ap. For a node n, link l

can trigger n if and only if the signature sent by l.sender or l.receiver can be received by

node n. Link l1 can trigger link l2 if and only if l1 can trigger l2.sender. The l.inbound is

the number of signatures that l.sender receives. Larger l.inbound is more robust to trans-

mission failures because it indicates that more links can trigger l. However, larger inbound

also reduces the signature detection reliability. Therefore, the maximum inbound is set to

2. The outbound is the number of signatures that a node can broadcast, which indicates the

number of links the node can trigger. The experiment result in Section 3.3.2 suggests that

the maximum outbound for each node should be 4.

58 A strict schedule with k time slots can be denoted by S = [s1,s2, ··· ,sk], where si is a set of links that can transmit concurrently in slot i. In a strict schedule produced by an arbitrary scheduler, there is no guarantee that links in si, i ∈ [1, ··· ,k − 1] can trigger all the links in si+1. On the other hand, a relative schedule requires that the links in slot si+1 can be triggered by at least one link in si. To satisfy this requirement, the converter uses the following two techniques:

Fake link insertion: For each slot si ∈ S, we insert all the links that are not conflicting with the links in si to create a maximal cover in the link conflict graph. The inserted links are marked as fake links. The purpose of adding the maximal number of fake links is to keep all links in the network being triggered frequently to synchronize with the rest of the network. When a node is indicated to send a packet to destination d by the schedule or the received signature, but the node has no packet for d in its queue, the node will send a fake packet to d. Note that a node only need to send the header of the fake packet, instead of sending the entire fake packet. Thus, the interference introduced by these fake packets will be very low. Also note that the WiFi does not consume more energy when sending fake packets than idle listening [105].

Batch connection: Since the strict scheduler creates schedules in batches, we also need to create the triggering connection between two neighboring batches. After the current relative schedule is created, the last slot of the relative schedule is retained in the converter, and will be used as the first slot of the next batch. The exception is that in the very first batch, the first slot has no preceding slots and therefore cannot be triggered by any links.

In this case, the APs will individually start executing the schedule. If the link in the front of the schedule is a downlink, then the AP will send a packet according to the schedule.

Otherwise, the AP will send a signature to the sender of that link. Because the APs are

59 not synchronized, collisions could happen. However, we will show that relative scheduling

heals itself and synchronizes the schedules between different APs within a few slots in

Section 3.4.2.

After inserting fake packets into the strict schedule, we are ready to create the triggering relation between the links in the neighboring slots. Given two neighboring slots si and si+1, the objective is that for each l in si+1, find at least one link in si that can reliably trigger l, while the inbound and outbound constraints of each link is satisfied. For each link l in si+1, we first select one node n in si, such that n has the highest SNR at l.sender. After assigning one trigger to each link in si+1, links with saturated inbound or outbound links are excluded in the following steps. Then, we repeat the previous step on the remaining nodes to find the secondary possible triggering node for each link in si+1. This process stops when no more trigger can be added. Note that even after inserting fake links, it is still not guaranteed that each link in si+1 has a triggering link in si. Such a scenario happens rarely in our experiments. The scheduler will reschedule such links.

The last step is to insert ROP slots into the relative schedule. The duration of the ROP slot is the time needed for an AP to execute the ROP protocol. During an ROP slot, all links that interfere with the links associated with the polling AP must be silent. So, two APs can share the ROP slot if and only if none of their links are conflicting. We insert at most one

ROP slot between two neighboring slots. If an ROP slot is inserted between slots si and si+1, the linksin si will trigger links in si+1 with a special signature, which is called an ROP

′ signature, instead of the signature S as shown in Figure 3.8. Once links in si+1 receive the

ROP signature, they will wait for one ROP slot before starting transmission. The ROP slots are inserted into the relative schedule in a greedy fashion. To insert ROP schedule for AP

A into relative schedule S = [s1,s2, ··· ,sk], we first check if si has nodes that can trigger

60 A. If si can trigger A, and there is no ROP slot that has been inserted between si and si+1, then we insert an ROP slot between si and si+1. Otherwise, we check if A can poll together with the APs in this existing ROP slot. If they can execute ROP together, assign the ROP schedule for A. Otherwise, increase i and continue to check the next slot in S.

At the end, each link l in the created relative schedule will be distributed to AP l.ap.

During executing the relative schedule, the AP sends data packet or executes ROP accord- ing to the schedule upon reception of its own signature, and sends appropriate signatures to its clients to trigger the senders in the next slot. Each client sends data packets when receiving its own signature; triggers other links based on the signatures received from its

AP; and, returns queue size to its AP when a polling packet is received.

3.3.4 DOMINO Under Microscope

To take a closer look at the overall system, we again use the example shown in Fig- ure 3.7. However, all the uplink and downlink flows are saturated with payloads. Fig- ure 3.10 presents the transmission timeline from our trace-driven simulation. Because of jitter in the wired backbone, the packets in slot 0 are actually sent with a 24 µs time dif- ference. This delay passes to slot 1 and slot 2. However, in slot 2, link C1→AP1 receives

two triggers from link C2→AP2 and C3→AP3 (this can be decoded as C1 is waiting for a

polling slot). Since the transmitter uses the last correctly received trigger as time reference,

it gets synchronized in transmission with link C4→AP4. The transmission in the following

slots are then all synchronized. This result reveals the robustness of DOMINO to time jitter.

We marked some interesting points in the figure to emphasize the design of DOMINO.

Mark (1) indicates the trigger from link AP4→C4 to AP3→C3. As shown in Figure 3.7,

AP3 and AP4 are hidden to each other. Mark (1) presents an instance where the receiver

61 of the former transmission triggers the next transmitter. At mark (2), we assume this trans- mission fails. Thus, the following two triggers are missing. However, as a result, only one polling transmission is missed, which indicates that the effect of transmission failure in DOMINO is limited. Mark (3) presents the introduction of fake packets to increase the

5 network coverage of triggers . Link AP2→C2 in slot 94 would not have been triggered without this fake packet.

3.3.5 Practical Issues

Although we take care of many practical issues in the design of DOMINO, there are still some that require further discussions:

• Different packet sizes and data rates: In our design, we use a fixed slot time and

assume that all the data packets consume the same amount of time, which will likely

not hold at all times in practice. However, techniques, such as packet splitting and

aggregation, can help to produce virtual packets that take the same amount of time.

A simple calculation with the fixed packet duration, packet size and data rate will

suffice. Then, instead of reporting how many packets are queued to the central sever,

wireless nodes calculate and forward the total number of virtual packets.

• Number of clients per AP can support: The wireless channel is divided into 24

subchannels, which limits the total number of clients per AP. In case the number of

clients is more than 24, we can divide the clients into multiple sets, with each set

having no more than 24 nodes. And then the AP can poll once for each set.

• Missed ACKs: When the ACK for packet p is missing, the sender adds a new trans-

mission request. Instead of waiting for the request to be scheduled again by the

5Note that fake packets are also scheduled by the central server.

62 Scheme SC HT ET DOMINO (Kbps) 4.25 5.42 9.18 DCF (Kbps) 2.76 1.62 2.72

Table 3.2: Aggregate throughput in 3 different scenarios with USRP prototype

server, the sender will retransmit when either of the following two conditions is met:

1) the sender is a client and it receives its trigger; 2) the sender is an AP and the

schedule at the top of the schedule list has the same destination as p.receiver.

3.4 Evaluation

3.4.1 Experimentation

We implemented a simple version of DOMINO on USRPs and compared its perfor- mance with DCF. We assume that the queues in the clients are saturated and the transmis- sion schedules are already loaded in each AP. Four USRPs are used to simulate two AP- client pairs and the flows on two links are created. Then, we studied the aggregate through- put in three different scenarios: (i) those two links are exposed to each other (ET); (ii) they are hidden links to each other (HT); (iii) they are in the same contention domain and are neither exposed nor hidden to each other (SC). Table 3.2 shows the aggregate through- put. Because DOMINO does not incur the overhead of backoffs, it achieves 54% higher throughput even in the SC setting. In case of hidden and exposed terminals, DOMINO obtains more than 3× the throughput of DCF. As there exists a significant latency variance between the USRP and the host, implementing CENTAUR’s carrier sensing mechanism to align exposed transmissions using USRP devices is difficult. So, we leave the comparison of DOMINO and CENTAUR to trace-driven evaluations.

63 3.4.2 Trace-Driven Simulation

The above prototype with USRP shows the potential advantages of DOMINO. How- ever, the limited number of USRP platforms and the latency with USRP prohibits a large scale implementation. Thus, we perform measurements in a network of 40 WiFi nodes spread across 2 buildings and use the RSS trace between different nodes to conduct a large scale evaluation in ns3.

Evaluation Setup

Let’s denote T (m, n) as a network topology with m APs and n clients per AP. To create T (m, n), we first sort the nodes from our trace by the number of nodes in their communication range in a decreasing order. Then, we select the first node as one AP, and randomly pick n nodes that could communicate with the AP as clients. We repeat this

process to select the remaining (m − 1) APs and their clients.

Unless specified, the default traffic (UDP or TCP) rate for uplink and downlink is 10

Mbps. The evaluation results are based on a run of 50 seconds. The physical layer data rate

is set to 12 Mbps and the data packet size is 512 Bytes. Wired connections are created be-

tween the APs and the central server. The latency on the wired connection is set following

a normal distribution with mean 285 µs and variance 22 µs according to [88].

In DOMINO, we use a scheduler based onRAND [77], a greedy algorithm. To calculate

the schedule for each slot, the first link l from the queue of links Q that has data to send is added to a set C(l). Then we add another link l′ from Q−C(l) to C(l) if l′ is not conflicting with any link in C(l). This process is repeated until no more links can be added. All the links in C(l) are then scheduled in this slot. To improve the fairness, we move the links in

C(l) to the end of Q. The following slots are scheduled in the same way.

64 20 20 µs s)

µ Alignment 40 µs is achieved 15 60 µs 80 µs 10

5

Max Tx misalignment ( 0 0 1 2 3 4 5 Slot index

Figure 3.11: Maximum transmission misalignment at the start of transmissions

Time for transmissions to reach synchronization

Because of jitter over the wired network, the transmissions for slot 0 may not be well aligned. We use the network T (10, 2) to study how long it takes for the misalignment to converge. The result is presented in Figure 3.11. The wired latency variance is changed from 20 µs to 80 µs. The figure shows that although the maximum misalignment varies from 10 µs to 20 µs, it is reduced to 1 or 2 µs within 4 slots. This result indicates that our scheme does not need a long warm-up time and also it is robust to initial misalignment.

Throughput and fairness

To evaluate and compare the throughput and delay between DOMINO, CENTAUR and

DCF, we still use T (10, 2) with the downlink data rate fixed to 10 Mbps. The uplink data rate varies from 0 to 10 Mbps. There are 10 hidden link pairs and 62 exposed link pairs out of 720 possible link pairs. For both CENTAUR and DCF, the MAC parameters are set according to the 802.11g standard. The throughput fairness among all links is calculated using the Jain’s fairness index [50]. As shown in Figure 3.12(a), DOMINO outperforms

DCF by 74% when there is only downlink UDP traffic. Although the throughput gain

65 6 x 10 35 10 1 s) µ 8 30 0.8 6 25 0.6 DOMINO Centaur 4 20 DCF 0.4 2 Jain’s Fairness Index

15 Average delay per link ( Aggregate throughput (Mbps) 0 0.2 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 Uplink data rate (Mbps) Uplink data rate (Mbps) Uplink data rate (Mbps) (a) UDP throughput (b) UDP delay (c) UDP throughput fairness 5 x 10 4 1

30 s) µ

25 3 0.8

20 2 0.6

15 1 0.4 Jain’s Fairness Index 10 Average delay per link (

Aggregate throughput (Mbps) 0 2 4 6 8 10 0 0.2 Uplink data rate (Mbps) 0 2 4 6 8 10 0 2 4 6 8 10 Uplink data rate (Mbps) Uplink data rate (Mbps) (d) TCP throughput (e) TCP delay (f) TCP throughput fairness

Figure 3.12: TCP and UDP throughput, delay and fairness for T (10, 2). The downlink data rate is fixed to 10 Mbps and the uplink data rate varies from 0 to 10 Mbps.

decreases to 24% as the uplink UDP traffic data rate increases, DOMINO has high fairness around 0.78, compared with 0.47 for DCF (Figure 3.12(c)). For TCP traffic, the throughput gain of DOMINO varies within 10% to 15% (Figure 3.12(d)), while the fairness gain is between 17% and 39% (Figure 3.12(f)). The reason why TCP traffic does not produce as high throughput gain as UDP is that we treat the TCP ACK packet as a regular data packet and it takes one whole slot to transmit the TCP ACK, which wastes the channel resource.

We believe that aggregating TCP ACKs will help improve the gain, but leave it as a future work.

Figure 3.12 surprisingly shows that the performance of CENTAUR is worse than DCF when the uplink data rate is low. We looked deep into the simulation results. Although

CENTAUR has 0 ACK timeout compared with 57386 times for DCF when the uplink data rate is 0 Mbps, CENTAUR is sensitive to network topology and behaves worse than

DCF when the scheduled downlink traffic is not transmitted in the way it is supposed to

66 Topology DOMINO CENTAUR DCF Figure 3.13(a) (Mbps) 32.72 28.60 9.97 Figure 3.13(b) (Mbps) 33.85 18.35 22.13

Table 3.3: Aggregate throughput with 4 pairs of exposed links

be. The assumption that exposed links transmit concurrently just does not hold in some network topologies. To illustrate this argument better, we use the topologies shown in

Figure 3.13 as examples. Assume that only downlink traffic exists. In both examples, all of the downlinks are not conflicting with each other and are scheduled in the same batch as in CENTAUR. However, AP1, AP2 and AP3 are not in the communication range of each other in Figure 3.13(b). Since CENTAUR uses carrier sensing and fixed back-off intervals to synchronize the transmissions, the transmissions can not be synchronized. Thus, in each batch, AP1, AP2 and AP3 have higher chance than AP4 to access the channel and finish transmitting earlier. But the scheduled packets for the next batch would not arrive until all of the packets at AP4 are sent. In DCF, however, AP1, AP2 and AP3 always have packets in their queues and keep contending for the channel. Table 3.3 presents the aggregate throughput for both topologies. For Figure 3.13(a), the throughput of both DOMINO and

CENTAUR are around 3× the throughput of DCF. However, the throughput of CENTAUR

is lower than DCF for Figure 3.13(b) while DOMINO provides the same throughput in

both scenarios.

Delay

Figures 3.12(b) and 3.12(e) plot the average delay of different schemes. The delay

is defined as the duration from the time a packet is queued to the time it is successfully

67 C 1 C 1

AP 1 AP 1 C AP AP C 2 2 3 3 C AP AP AP C AP 2 2 4 3 3 4

C C 4 4 (b) 3 links that share a common exposed link (a) 4 links that are exposed to each other

Figure 3.13: Exposed links example. Dashed links indicate nodes are interfering with each other and solid links denote AP-client pair.

delivered. The delay of DCF is 2× higher than DOMINO. Because the UDP traffic data rate is high, the MAC layer queue gets saturated quickly. So queuing delay significantly contributes to the packet delay. DOMINO promises higher throughput, which means that packets get delivered faster. The packet delay for TCP traffic is shown in Figure 3.12(e).

Because of TCP congestion control, the MAC queue increases slower than UDP traffic.

On the other hand, the congestion control window size grows faster for DOMINO than for

DCF. So higher throughput indicates faster packet delivery as well as more queued packets.

These two factors have opposite effects on the packet delay, resulting in comparable packet delay for DOMINO and DCF.

Simulation with a random network

The above trace only consists of 40 nodes, which limits the scale of the network that we can simulate. In this section, we uses the default path loss model in ns3 to compute the RSS between different nodes instead of manually setting the RSS from the trace. We randomly placed nodes in an 800×800 m2 area and create a topology T (20, 3), which consists of

68 1

0.5 CDF

0 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Throughput gain

Figure 3.14: The CDF of throughput gain of DOMINO over DCF with 50 runs

80 nodes. We repeat the simulation 50 times with UDP traffic and plot the CDF of the

throughput gain of DOMINO over DCF in Figure 3.14. The throughput gain varies from

22% to 96% with a median of 58%.

3.5 Discussion

Building conflict graph dynamically: In our current design, we assume that the con-

flict graph does not change over time, which does not hold in mobile scenarios. Updating the conflict graph with low overhead remains a challenge. [52] have provided a scheme that updates the conflict graph of a network with time Nt, where N is the number of nodes in the network, and t is the time of sending one packet. Since non-interfering nodes can send the concurrently, the time complexity can be reduced to t(∆+1), where

∆ is the maximum degree of the two-hop connected graph. The two-hop connected graph is created by connecting any two vertices that are within two hops in the interference graph.

69 The interference graph can be estimated based on previous known channel status. [35] have

shown that in the 2.4 GHz spectrum, the channel coherence time of walking is 125.1 ms,

which is the maximum conflict graph updating period. Therefore the overhead of periodi-

cally generating the conflict graph is t(∆+1)/(125.1 ms). When ∆=40 and each beacon

takes 40 µs, the overhead is only 1.3%, which is negligible compared with the throughput

gain.

Co-existence with current networks: Enterprise networks may be subject to external

interference such as from external WiFi networks. To co-exist with existing networks,

DOMINO divides time into two parts: a centralized contention free period (CFP) and a

carrier sensing contention period (CoP) as shown in Figure 3.15. The contention free period

contains several slots and supports concurrent transmissions in each slot. To reserve the

channel during this duration, we set the Network Allocation Vector (NAV) duration to the

end of the CFP period in the MAC header of each transmitted packets. External nodes have

to defer their transmission upon receiving the NAV. In the contention period, all nodes use

carrier sensing to access the channel. The server estimates the amount of external traffic

and internal traffic during the contention period, and adjusts the durations of the following

CFP and CoP to provide fair access to all traffic.

Light traffic load: DOMINO improves the throughput of the network under heavy traffic. However, with light data arrival rate, the throughput gain will not be high and the control overhead increases the packet delay. In network topology T (6, 5) with traffic rate 6

KBps (this is lower than typical web browsing, considering that the home page of Yahoo!

is around 1.9 MB), the delay of DOMINO is only 1.14× higher than the delay of DCF,

which is not extremely high. In addition, we can utilize the CFP and CoP duration as

70 discussed above, to solve this problem. Under light traffic, we set CFP duration to 0 to turn

off scheduling.

Energy saving: It is straightforward to implement energy saving mechanism in DOMINO.

For example, the server can schedule an energy constrained device to sleep for a duration

within which it does not need to send or receive packets.

Number of signatures: DOMINO uses signatures with 127 bits and supports 127 nodes in one collision domain. To support mores nodes in one collision domain, there

are several choices. First, instead of using 127 bits as the signature length, we can use

255 and 511, supporting 255 and 511 nodes in one collision domain respectively. Second,

the combination of those 127 signatures can be used to identify one node. Both of the

choices results in larger signature duration, which increases the overhead. So an algorithm

to estimate the node density is required to choose the best signature length.

Polling frequency: Currently, DOMINO polls the queue information of the clients in every batch, which may result in a wastage of channel resource. Intuitively, the APs should not send polling packets when the scheduler has enough packets to schedule. However, this could cause starvation at some clients. We run a set of simulations to evaluate the delay and throughput of UDP traffic in T (10, 2) when varying the batch size (the reciprocal of polling

frequency). The simulation results show that when the network traffic is heavy (5Mbps per

link), as the batch size increases, the delay slightly decreases and the throughput slightly

increases. However, when the traffic is light (500Kbps per link), the delay increases when

the batch size increses. We leave the design of a better polling scheme as future work.

71 Contention Free Period Contention Period

Slot 1 Slot 2 Slot 3 … Slot N

→ C AP 1 1 → C2 AP 2 Concurrent … transmissions → C AP M M

Figure 3.15: DOMINO consists of contention free period (CFP) and contention period (CoP). The CFP is divided into different slots and each slot supports multiple concurrent transmissions.

3.6 Related Work

WiFi centralized control plane schemes: Existing work, such as [19,58,63], have fo- cused on the channel assignment, client association and power control problems. MDG [23] studied the relationship between three different functions with centralized solution: channel allocation, load balancing and power control and proposed a joint framework of different schemes. DenseAP [67] introduces the idea of dense AP deployment. A central controller determines the AP for each client to be associated with dynamically and the channel to assign to each AP. A network architecture and a set of APIs are defined in Dyson [68] to provide an easy way for network observation and implementing control polices. These papers focused on the control plane of centralized network, while our proposed scheme targets channel access.

Centralized and hybrid data plane schemes: PCF, defined in the 802.11 standard, is proved to be promising in supporting real-time traffic [29] with a single AP. MIFI [18]

72 extends the use of PCF to multiple AP scenario. In the contention-free period (CFP), non- interfering APs polls the traffic simultaneously. This work, however, creates more exposed links originating from their definition of non-interfering APs. CENTAUR [88] proposed a hybrid centralized scheme with the assumption that the user devices will remain unmodi-

fied. The downlink traffic are scheduled according to the exposed and hidden relationships between different links. However, the unscheduled uplink traffic disturbs the performance in an unpredictable way. DOMINO does not suffer from a similar problem because it con- trols all the traffic, including both uplink and downlink. OmniVoice [10] only schedules downlink traffic. It divides time into uplink and downlink slots so that the uplink traffic does not disturb the downlink schedule.However, the synchronization accuracy degrades with increase in the size of the backbone network. XPRESS [57] uses backpressure algo- rithms to obtain optimal throughput in multi-hop wireless networks. It could be extended to enterprise networks. However, it also only schedules downlink traffic.

Clients queue status: To obtain queue information of the clients, some recent works have leveraged the fact that there is room between the channel capacity and real data rate so that the ongoing transmission can tolerate some interference. Side Channel [101] focused on ZigBee networks, and used interfering signals on different chip positions or different chip intervals to convey the request for uplink transmission. Flashback [28], on the other hand, worked on the OFDM system. Interfering signals, called flashes, are sent on a given frequency and the frequency interval between the adjacent flashes encodes the clients queue information. These two techniques promise high performance in a network with a single

AP. However, with multiple APs and exposed transmissions, the benefits of side channels or flashes decreases. In [54], each client is assigned a unique bit sequence, named trans- mission request. The sequences assigned to different clients are orthogonal to each other so

73 that these requests can be detected by the AP. This method suffers from different received power from the clients and it only transmits 1 bit information back to the AP.

3.7 Conclusion

In this chapter, we proposed Relative Scheduling, which allows wireless nodes to trans- mit relatively one after the other. The use of node signature as transmission trigger in rela- tive scheduling is also verified using the USRP platform. Then, we developed DOMINO, a centralized scheduling framework for enterprise WLANs based on the concept of Relative

Scheduling. Our evaluation results show that DOMINO significantly outperforms DCF by up to 96% higher throughput.

74 Chapter 4: BBN: Throughput Scaling in Dense Enterprise WLANs with Blind Beamforming and Nulling

4.1 Introduction

The recent explosive growth in the number of mobile devices and the data generated by these devices has led to a decrease in the channel resources available to each individual device. Network administrators have tried to tackle this problem by densely deploying access points so that users can almost always find a closeby AP with good signal strength.

However, dense deployment of APs does not scale well with the throughput demands.

In the existing network protocols [66,88], when one mobile client is transmitting uplink packets to an access point, the nearby clients have to remain silent to avoid interference to the ongoing transmission.

Recently, multiple algorithms have been proposed that help in scaling the throughput with number of wireless devices. Interference Alignment (IA) [24] is one of such tech- niques that requires clients to participate in a schedule with exponential number of slots.

However, mobile clients are really mobile. They may not stay at the same place for a long time. Multi-User MIMO (MU-MIMO) [36] enables scaling of throughput with number of devices, but it requires APs to exchange samples over the backbone. Although, the wired backbone in Enterprise Wireless LANs (EWLANs) is underutilized [17, 39], exchanging

75 samples requires significantly higher bandwidth compared to exchanging packets which

cannot be supported by current wired networks [39,41]. Joint beamforming based algo- rithms such as [56,74] work only for the downlink traffic. To perform joint beamforming, these algorithms require all transmitters to share the contents of all packets to be transmit- ted. However, mobile devices are not connected through a wired backbone, and are unable to share the packets amongst each others.

This chapter proposes BBN, the first implementationof Blind Beamforming and Nulling scheme that enables multiple nearby access points to concurrently receive uplink packets from multiple mobile clients, all within a single collision domain without overwhelming the backbone. BBN does not increase energy consumption on the clients and executes ex- actly over two time slots. BBN leverages three properties that are unique to EWLANS: (i)

Dense deployment of APs (See Fig. 4.3 and [66]); (ii) Capability of these APs to exchange packets with each other over the underutilized wired backbone; and, (iii) Immobility of APs resulting in relatively stationary channels (See Fig. 4.2). When one AP is receiving uplink data, existing algorithms [66] including IEEE 802.11 WiFi, suppress nearby APs to trans- mit or receive data. In contrast, BBN makes use of the energy-rich access points to assist their clients (mobile devices) in decoding their packets at their respective access points. In

BBN, the clients only participate in the first slot and the access points participate for the clients in the second slot.

Consider the example enterprise WLAN shown in Fig. 4.1(a) where all the APs and the three clients are in a single collision domain. Assume that the three users want to upload one packet each to the backbone. An omniscient TDMA scheduling algorithm with global knowledge would require three time slots to complete this upload. In BBN, in the

first slot as shown in Fig. 4.1(a), all users will transmit at the same time. All the 4 APs

76 h(1) x + h (1) x 11 1 21 2 h(1) x + h (1) x + h(1) x 12 1 22 2 31 3 (1) (1) + h 32 x3 a x + s h x + 11 1 1 21 2 a12 x1 + a 22 x2 + s h(1) x 1 31 3 a32 x3 AP 1 AP 2

Switch AP 1 AP 2 Switch

AP 3 AP 4

AP 3 AP 4 x1 x2 x3 (b) Second slot. A subset of APs transmit in the second slot while the rest of the APs C1 C2 C3 receive. aij are the final channel coefficients (a) First slot. x1, x2 and x3 are the three after the transmission of the second slot. si packets transmitted by C1, C2 and C3, is the scaling coefficient at APi. 1 respectively. hij is the channel from client i to APj during time slot 1.

Figure 4.1: Illustration of BBN over a topology of 3 clients and 4 APs. All devices belong to the same collision domain and can hear each other.

will receive a combination of three transmitted packets. In the second slot, AP3 and AP4 will retransmit the received signals by first precoding [48] them such that the following condition is satisfied as shown in Fig. 4.1(b): At AP1, samples corresponding to x2 and x3 in the second slot align with the samples corresponding to x2 and x3 in the first slot.

Decoding happens in multiple steps as follows:

1. At the end of the second slot, AP1 scales the samples received by AP1 in the second slot and subtracts them from the samples received in the first slot. This scaling is done such that samples corresponding to x2 and x3 are nulled. Afterwards, it is left with only the

77 samples corresponding to x1. AP1 decodes the samples to obtain the packet transmitted by

C1. Next, it transmits the decoded packet over the backbone to AP2.

2. AP2 recreates the samples corresponding to x1 and subtracts them from the samples received in the first slot and the second slot.

3. After subtraction, AP2 is left with two equations (one from each slot), and two variables

(x2 and x3). AP2 solves the two equations to obtain x2 and x3.

4. Afterwards, AP1 and AP2 forward x1, x2 and x3 towards their destinations.

BBN enables the three transmitters with single antenna to upload three packets in two

slots, improving the throughput by 50% compared to omniscient TDMA. In Section 4.2,

we show that in networks with high enough density of APs, BBN enables N mobile clients to transmit N uplink packets in exactly two slots resulting in unbounded throughput. Also,

note that BBN requires the APs to exchange only the decoded packets instead of the raw

samples.

The focus of BBN is to increase throughput of the uplink traffic for clients with single

antenna. This is in contrast with [56,74] that focus on downlink traffic. Recently, uplink

traffic [17, 41] has been growing at a fast rate due to the emergence of wide-range of ap-

plications, such as cloud computing, video conferencing, online gaming, VoIP, and traffic

generated from mobile devices (e.g., location information or sensor readings). BBN makes

extensive use of the wired backbone. Besides transmitting the decoded packets, the channel

state information, which are required to do nulling in the second slot, are also exchanged

over the backbone. Since BBN migrates most of the complexity from the mobile devices

to the APs, it allows BBN to work even when the channel from clients to APs is rapidly

changing due to client mobility. BBN works as long as the APs are time-synchronized

78 −30 −30

−40 −40

−50 −50

RSS (dB) −60 RSS (dB) −60

−70 −70

−80 −80 0 5 10 15 20 0 5 10 15 20 Time (s) Time (s) (a) Channel between a pair of APs (b) Channel between a mobile client and an AP

Figure 4.2: Received Signal Strength (RSS) in an office environment. The channel between APs is relatively stationary compared to channel between AP and mobile client.

with each other and it places very few requirements on the clients. This chapter makes the

following contributions:

1. We propose a blind beamforming and nulling scheme, BBN, that scales uplink through-

put with the number of access points. BBN also works over multiple collision domain. 2. This chapter shows the first implementation of blind beamforming and nulling on USRP

radios. Experiments performed on our testbed show that BBN achieves 1.48× throughput compared to omniscient TDMA. 3. Trace-driven simulation results show that in a large Enterprise WLAN, BBN can lever- age the density of the access points. In EWLANs with high density of APs, BBN provides a throughput of 5.6× compared to omniscient TDMA and 52.4× compared to IEEE 802.11.

4.2 Illustration

Before discussing BBN in detail, we define a few notations. All of the clients and APs in BBN are assumed to have only one antenna. The network consists of clients C1, C2 and

79 Number of APs 1 0.75

CDF 0.5 0.25 0 60 100 140 180 Number of APs

Figure 4.3: CDF of number of APs observed across different locations. The data was collected at multiple places including a hospital, a large university library and an apartment complex.

(1) C3 and four APs from AP1 to AP4 that are connected through a wired backbone. Let hij be the channel state information between Ci and APj in slot 1. In the second slot, a subset of APs are selected to transmit. For this example, this set consists of AP3 and AP4. Let

(2) hkj be the channel state information between APk and APj in slot 2. In this section, we assume that all the wireless devices are in single collision domain (i.e., they can all hear each other). In Section 4.5, we extend BBN to networks with multiple collision domains.

Let xi be the packet sent by Ci in slot 1. In the following discussion, we ignore the presence of noise since it is not possible to null the noise. However, we do take noise into account

(t) in our analysis (See Section 4.4.4) and then later in our simulations (Section 4.7). Let yik be the component of xi received by APk in slot t. We have:

(1) (1) yik = hik xi (4.1)

Let vk be the precoding vector for APk in the second slot and M be the total number of

(2) APs (In this example, we have M = 4). Let yij be the component of xi received by APj

80 in slot 2. We have:

M M y(2) = h(2)v y(1) = h(2)v h(1)x (4.2) ij X kj k ik X kj k ik i k=3 k=3

We want to ensure that components of x2 and x3 at AP1 are a linear combination of their components in the first slot. Let si be the scaling coefficient at APi. Thus,

M y(2) = h(2)v h(1)x = s y(1) = s h(1)x (4.3) 21 X k1 k 2k 2 1 21 1 21 2 k=3 M y(2) = h(2)v h(1)x = s y(1) = s h(1)x (4.4) 31 X k1 k 3k 3 1 31 1 31 3 k=3 Simplifying these equations, we get

M h(2)v h(1) − s h(1) =0 (4.5) X k1 k 2k 1 21 k=3 M h(2)v h(1) − s h(1) =0 (4.6) X k1 k 3k 1 31 k=3 Since, the right sides of Eqs. 4.5 and 4.6 are all 0, instead of 2, at least 3 variables are required to obtain non-zero solutions. One of these variables is the scaling coefficient

(s1). Thus, a total of 2 transmitting APs are required to supply these variables. Further, two receiving APs are also required such that the first AP decodes x1 while the second AP decodes x2 and x3. Thus, in total M =2+2=4 APs are required to support 3 clients as in Fig. 4.1.

In BBN, for the network shown in Fig. 4.1, at the end of slot 1, AP 3 and AP 4 solve

Eqs. 4.5 and 4.6 to obtain precoding vectors which are then used during slot 2 (See Eq. 4.2).

This computation may take time (due to communication among APs over the backbone). In general wireless networks, this creates inaccuracies since the channel between APs and the mobile clients may change from the time the channel state information (CSI) was measured to the time when the APs retransmit the data in the second slot. Thus, the precoding vectors

81 that were computed based on old CSI may not be suitable for the channel’s current state.

This may lead to inaccurate beamforming and nulling. However, in BBN, the mobile clients

do not participate in the second slot. Only the APs transmit and receive data in the second

slot. Due to the immobile nature of the APs, the channel (or CSI) between APs changes

very slowly (See Fig. 4.2(a)). Thus, the CSI computed among APs is valid for longer

duration compared to CSI between mobile clients and APs. By requiring only the APs to

transmit in the second slot, BBN ensures higher accuracy of joint beamforming and joint

nulling.

Number of APs required: In general, if there are N clients in the network, then BBN needs to align N − 1 packets at the first AP, N − 2 packets at the second AP and so on.

N 2−N−2 Thus, a total of at least (N − 1)+(N − 2) + ··· +2 = 2 variables are required to satisfy all the constraints. However, to obtain a non-zero solution, we need to include

N 2−N one extra AP, i.e. a total of 2 APs. However, N − 2 of the variables are supplied by

N 2−N N 2−3N+4 the scaling coefficients at the receiving APs. Thus, a total of 2 − (N − 2) = 2 transmitting APs are required. Finally, N − 1 receiving APs are also required in slot 2, where the first N − 2 receiving APs decode one unique packet while the last AP decodes 2

N 2−3N+4 N 2−N+2 packets. Therefore, with 2 + N − 1 = 2 APs, BBN can leverage this high density of APs to decode N uplink packets in exactly two slots. Further, in contrast to [16],

BBN requires N fewer APs.

4.3 Challenges

Note that when the APs (i.e., AP3 and AP4) in slot 2 transmit, they have to align the samples of x2 and x3 at AP1. To achieve this, they precode the signals that they received in the first slot and transmit. However, in contrast to the existing solutions [74], in BBN,

82 the transmitting APs are not aware of what they are transmitting (since they are unable to

decode the samples received in the first slot). We call this Blind Beamforming and Nulling.

Although the idea behind BBN is simple, there are multiple challenges that need to be

handled to make it practical.

1. Oblivious to the contents of the transmitted signal: The APs transmitting in slot 2

are not aware of the contents of the signals transmitted in slot 2. Despite this, they need to

cancel out (or align) the different contents of the signal at different receiving APs. 2. Synchronization: In order for the APs transmitting in slot 2 to align their signals at

the receiving APs, these transmitting APs are required to be synchronized at the sample

level. This requirement is similar to the requirements of the other existing algorithms that

focus on downlink traffic [56,74,75]. Observe that BBN does not impose synchronization

requirement on the mobile clients. 3. Multi-collision domain: The previous discussion assumes that all clients and all APs

can hear each other directly. However, this may not be true for large scale EWLANs. Thus,

we need a mechanism to extend BBN to such networks.

N 2−N+2 4. Inconsistency in the AP density: To decode N packets, BBN requires 2 access points nearby. However, the actual number of APs present may be higher or lower than this number. If the number of available APs is higher, then BBN can make use of all of them.

On the other hand, if the number of available APs is smaller, than a mechanism is required to select a subset of the clients. 5. Robustness: Unlike downlink [74], where each client individually decodes its own

packet, in BBN, decoding happens in a cascading fashion. Decoding of a packet depends

on the successful decoding of the previous packets. Clearly, in such a design, failure in de-

coding of one packet, makes all future decodings unsuccessful. We need a new mechanism

to increase the robustness of the decoding.

83 The next two sections explain how we handle these challenges.

4.4 Physical Layer Design

In this section, we explain the physical layer working of BBN using three different phases. First, we explain how multiple clients transmit simultaneously to the APs and how the channel state information between clients and APs is estimated. Then, we show how the APs conduct blind-beamforming and nulling without knowing the contents of the trans- mitted signals. Finally, the decoding process is explained. In BBN, the clients participate in only the first phase while the APs participate in all the three phases.

4.4.1 Phase I: Client transmission

As explained in Section 4.2, the transmissions in BBN are divided into two slots. In the first slot, the clients transmit concurrently to the APs. Besides the received combined samples from the clients to APs, the channel state information (CSI) between all the clients and APs is also computed in this phase. To obtain the CSIs, each client sends an access code (or unique PN sequences [61] assigned to each client) that is free of interference.

The transmission timeline of Phase I is shown in Fig. 4.4. First, the APs broadcast an approve message. This message contains the IDs of the clients that are allowed to transmit in this slot (For more details on how the APs select the subset of clients, refer to Section 4.5 that describes the MAC design of BBN). The relative order of the IDs determines the time when a client should transmit its access code. Since the clients are not synchronized, the transmission of access codes may partially overlap with each other. To avoid this overlap, a small time gap, called inter-access-code-space (IACS), is inserted between the transmis- sions. Finally, after the transmission of access codes, the clients transmit their packets

84 simultaneously. All the APs compute the CSI from different clients using the interference- free access codes and also store the received samples corresponding to the data packets. In our experiments and simulations, we set the duration of IACS to 2µs.

To conduct blind-beamforming, besides the CSIs between the clients and APs, the CSIs between the transmitting APs and receiving APs are also required. As shown in Fig. 4.4, all of the APs broadcast their access codes one after the other. When one AP broadcasts, all other APs can estimate the CSI from that AP. The estimated CSIs along with the CSIs between clients and APs are forwarded to a group-head AP through the wired backbone network. The head AP, uses these CSIs to compute the best sets of transmitting APs, the set of receiving APs, the decoding order, and the precoding vector to be used by each of the transmitting AP. This information is then sent back by the group-head AP to every AP in the group. In Fig. 4.5, AP3 and AP4 are selected as the transmitting APs.

This computation at the group-head AP and the distribution of result back to APs may take some time due to delays on the wired backbone. To ensure that all APs have re- ceived the computed results back from the group-head AP, BBN requires all APs to wait for Backbone-Inter-Frame-Space (BIFS) duration.

4.4.2 Phase II: Blind-beamforming

After waiting for BIFS time, all APs multiply the samples received in the first slot with their precoding vector and retransmit them (See Fig. 4.5). The value of BIFS can be selected on the basis of the speed of the ethernet and the expected delays involved. To avoid wastage of wireless channel during BIFS, APs in BBN participate in another set of communication (e.g., downlink traffic) while waiting to hear back from the group head

AP. Observe that since the APs are relatively stationary, the precoding vectors computed

85 Approve SIFS IACS IACS IACS SIFS SIFS SIFS SIFS BIFS A AP 1 : A1 A AP 2: A2 A AP 3 : A3 A AP 4 : A4 A C1 : C1 Packet 1 Packet 2 C2 : AC2

C3 : AC3 Packet 3

Figure 4.4: Phase I time-line: ACi and AAj represent the access codes for Ci and APj, respectively.

by group-head AP are valid for a long duration as described in Section 4.2. Further, due to the relatively stationary channel among APs, we do not need to frequently measure the channel among APs which further reduces the overhead incurred during Phase I. The short packet Pre sent by AP3 is a sequence known to all of the APs. The purpose of sending this sequence is two fold: 1.) it can be viewed as a preamble for the receiving APs to detect the correct start point of the retransmission; 2.) it can be used to estimate the sampling offset between the transmitting APs and receiving APs.

4.4.3 Phase III: Decoding Packets

In BBN, the packets are decoded in a sequential order. The first AP decodes one packet and sends it to the next AP which, upon receiving the packet, recreates the received sam- ples, and then subtracts those samples from the received samples. The remaining samples are decoded to obtain the second packet. This process is continued until all packets have been decoded. Performing a successful subtraction requires estimating various offsets such

86 BIFS AP 1 :

AP 2: v AP 3 : Pre 3* Samples 3 v AP 4 : 4* Samples 4

Figure 4.5: Phase II time-line: vi denotes the precoding vector of APi.

as frequency offset, sampling offset, and phase offset. Once the offsets have been esti- mated, the AP needs to recreate the received samples. This sequential decoding and packet subtraction have been well-studied in the literature [17, 38]. We refer the reader to the existing literature.

Another practical issue to note is that there is sampling offset between the transmitting

APs and the receiving APs in the second slot. This offset makes it difficult to align the components of x2 and x3 received by AP1 in the second slot with the corresponding com- ponents received in the first slot. To that end, a packet Pre that is known to every receiving

AP is transmitted by AP3 in Phase II. This packet is used to estimate the sampling offset between the transmitting and the receiving APs using the same techniques as in packet subtraction [17,38].

4.4.4 Computing the Packet Decoding Order

In the previous discussion (Sec. 4.2), we assumed that x1 is decoded first, followed by x2 and x3. We also assumed that joint precoding leaves no residual noise. However, in practice, joint precoding and packet subtraction are not perfect and leave some residual noise. Thus, in this section, we compute the optimal order in which packets should be

87 decoded such that the decoding accuracy is maximized in the presence of the residual noise.

To determine the optimal decoding order, we need to compute the expected received signal

strength (RSS) of each packet (say xi) at each AP (say APj). The exact value of RSS depends on the precoding vectors which in turn depend on the rest of the matching. This makes the problem combinatorial in nature.

We compute the expected RSS of client i at APj using a heuristic. In the second slot, let

APN to APM be the set of transmitting APs and AP1 to APN−1 be the set of receiving APs.

Observe that in the second slot, APj receives components of xi that have been retransmitted

by all APs in the range APN to APM . Thus, components of xi arrive at APj through

M − N +1 different paths. Each of these M − N +1 paths start at client i, pass through

some transmitting AP (say APk) and end at APj. Further, each of these paths consist of two

links: First from Ci to APk and, second from APk to APj. We say that RSSij is expected

to be high only if there is at least one path on which xi has high signal strength on both the

links. If P0 is the transmission power level, then, we can estimate the RSS of xi at APj as

follows:

(1) 2 (2) 2 RSSij ≈ P0 × max min ||hik || , ||hkj || (4.7) k=N...M   

Consider client Ci that transmits packet xi at data-rate Ri. Let APj be the receiving

AP that decodes xi. If τi is the minimum SNR required to decode xi where τi depends

on the physical layer data rate, then the residual noise that can be tolerated at APj during

the decoding is given by [17]: RSSij . Using this, BBN computes the maximum residual τi

th noise that each packet can tolerate. Let us say APj decodes the i packet in the decoding

sequence.

Observe that in BBN the packets are decoded sequentially. So, ifa packet is notdecoded

correctly, then all other packets that depend on it can’t be decoded either. So, in order

88 to improve the decoding probability of all the packets, the decoding order is chosen by

arranging the packets in non-increasing order of the maximum residual noise that they can

tolerate.

4.5 MAC Design

In this section, we first explain how BBN works in large scale networks. Next, we explain how BBN leverages the variation in the density of access points to improve the throughput of the uplink traffic. Finally, we explain how BBN coexists with ongoing down- link traffic in the network.

4.5.1 Multi-Collision Domain

The previous sections describe how BBN works in a single collision domain. To work in a practical multi-collision domain, BBN needs to solve multiple challenges:

1. In a multi-collision domain network, an AP may not be able to hear all other APs. This makes it difficult to synchronize them since existing algorithm with high synchronization accuracy [75] works only within a single collision domain. 2. The traffic distribution may be different across different parts of the network. For exam- ple, some parts of network may experience higher downlink traffic compared to others. 3. The MAC algorithm should ensure fairness across different clients. 4. Previous discussion of BBN requires that all cooperating APs and all clients are able to hear each other. Satisfying this requirement is challenging since frequent mobility of clients requires frequent re-computations.

BBN as described in Section 4.2 requires that: (i) All cooperating APs should be able to hear each other; and, (ii) All APs should be able to hear all clients. So, one naive way

89 of extending BBN to multi-collision networks would be to arrange both the APs and clients in groups such that within each group all APs and all clients can hear each other. However, this naive approach would require frequent re-computation of groups due to client mobility.

To ensure that BBN works with mobile networks without requiring frequent re-computations, we divide the EWLAN into cliques of APs while only satisfying the first requirement. Sat- isfying that requirement implies decomposing the graph into as few cliques of APs as pos- sible. Since, decomposing graphs into fewest cliques is an NP-Hard problem, BBN uses a greedy polynomial-time algorithm to compute such cliques. Our polynomial-time algo- rithm repeatedly finds a maximal clique among all APs. Then, it removes the vertices (and the edges incident on them) that are part of the maximal clique. The algorithm then runs on the remaining graph to find the maximal clique. This process is repeated until every AP is a part of some clique. All the APs that are in the same maximal clique, form a single group.

This decomposition algorithm can be run by a central server similar to [56,88]. Ensuring that all APs in the same group can hear each other allows BBN to leverage the existing synchronization algorithms (such as SourceSync [75]) to synchronize all the APs that are part of the same group.

Observe that since the APs are immobile, once the membership of different groups has been computed, it can be used for long periods of time. It is possible that an AP may not be able to hear a client that belongs to the same group. Thus, grouping based on APs only satisfies the first requirement specified above while the second requirement may be violated. We handle this in Subsection 4.5.2.

Computing neighbor relationship among groups: To prevent interference from neigh- boring groups and to keep groups independent, BBN ensures that at any time if the APs belonging to group G are communicating, then the APs belonging to neighboring groups

90 should not communicate. Two groups (say Gi and Gj) are said to be neighbors of each other if (i) There exists a wireless device (an AP or a client) in Gi that is in the interference range of a wireless device in Gj; or, (ii) There exists a wireless device (an AP or a client) in Gj that is in the interference range of a wireless device in Gi. To decouple the depen- dence of neighbor-relation from the location of mobile clients, BBN takes a conservative approach such that Gi and Gj are called neighbors even if there could potentially exist a client that can be in the transmission range of some AP in Gi while being in the interfer- ence range of some AP in Gj. By decoupling the neighbor relationship from the location of mobile clients, BBN significantly reduces the overhead that may otherwise arise due to frequent re-computations.

Scheduling different groups: To ensure that two neighboring groups are not transmit- ting simultaneously, BBN uses a central server [56,74,88] that manages the interference among neighboring groups. Since the schedule length in BBN is always two slots across all the groups, it makes it convenient for the server to schedule the active groups. In BBN, at any time t, the server computes the set of groups that will communicate for the next two slots (t and t +1). This set is computed using maximum independent set techniques such that there is no interference among the neighboring groups. However, due to unexpected delays on the wired backbone, the latency from the central server to the APs may result in APs unnecessarily waiting for the control messages from the server while the wireless channel is idle. To avoid this waiting, the server in BBN proactively computes the schedule and transmits it to the APs over the backbone.

Client-AP association: In BBN, clients do not permanently associate with any specific

AP or a group. The clients simply wait for the poll packet from any neighboring AP and transmit uplink data as soon as they receive the corresponding approve packet as shown

91 in Fig. 4.6. By keeping the clients stateless, BBN reduces the control messages exchanged between APs and clients.

ACK transmission: In BBN, the APs decode the packets during Phase 3. After decod-

ing, the APs send ACK over the wireless to the clients as shown in Fig. 4.6.

Downlink traffic: Uplink transmissions in BBN can coexist with downlink traffic.

Each group in BBN can either do downlink transmissions or uplink transmissions, inde-

pendently of the other groups. For downlink communication, existing algorithms [56,

56, 74, 88] can be used. The central server used in BBN can also be used for managing

downlink interference as in the existing algorithms [56,88].

4.5.2 Computing the set of transmitting clients

N 2−N+2 In general, in a system with N clients and 2 APs, BBN guarantees that each client can transmit 1 packet every two slots. Within a single group, it is possible that the number of APs may not be high enough to support all the clients. In that case, the group- head AP selects a subset of clients that would transmit in the first time slot. To ensure fairness among clients, BBN uses a weighted credit based system [56] such that the credit of a client is high if it has not been scheduled for a long period of time. Thus, the clients with the highest credit are given priority to transmit. This is further described in Section

4.5.2.

Fig. 4.6 explains the complete working of BBN. Initially, the APs in a group (if allowed by the central server) poll the network for uplink traffic. This is followed by a contention period in which different clients transmit short packets conveying their credit balance to contend for the uplink transmission. Next, the “group-head AP” computed the set of clients that are allowed to transmit their data packets. This information is conveyed in the Approve

92 message. Finally, the approved clients transmit their data packets which are decoded by the

APs in three phases as described in Section 4.4.

On the other hand, it is also possible that the number of clients are low while there are more APs available (e.g., in highly dense networks such as in Fig. 4.3). In that case, BBN

can leverage the extra APs to further improve the robustness of decoding as discussed in

Section 4.5.3.

Approve algorithm

In each group, a single AP is elected as the group-head AP that executes the Approve algorithm to compute the set of clients that are allowed to transmit. Approve (Algorithm

4) greedily computes the schedule. In each iteration, it adds the client with the highest credit value to the schedule (Line 8), thereby improving fairness. For such a client, it picks the best AP (say APj) that has not yet been paired with some other client (Lines 11-15).

Next, Approve tries to add this client-AP pair to the schedule S and checks if S is still satisfiable (Lines 16-18). This check is done by Algorithm Satisfiable. If this pair makes

S unsatisfiable (Lines 19-21), then the pair is removed from S. Also, Ci is marked as ineligible since it cannot be paired with any AP. This process is repeated until no more client-AP pairs can be added to S (Lines 9-10).

Algorithm Satisfiable determines if a given schedule is satisfiable or not. When doing this computation, Satisfiable takes into account the set of clients that each AP can hear.

Without loss of generality, let S be the schedule such that S = {(Ci, APi): APi is the

th receiving AP for packet xi and xi is the i packet to be decoded}. Satisfiable should return true if for every client-AP pair, say (Ci, APi), it can find a subset of i − 1 unique

APs in the same group that can align xi at the receiving APs (AP1 to APi−1). In other

words, for every client-AP pair, say (Ci, APi), Satisfiable needs to find i − 1 other APs

93 Input: For every eligible packet Pi, its transmitter Ci. Also, information on which AP can hear which client. Output: (i) Set of clients that will be approved in this slot. (ii) The matching from the approved clients to the APs indicating which AP decodes which packet. (iii) The decoding order.

// Set eligibility of all clients to true Ei ← true ∀i : 1 ≤ i ≤ N A← All APs in the current group // S is an ordered schedule that tells us which AP decodes which packet. The client-AP pairs are arranged in the order in which they are decoded. S ← {} while true do CSet ← {Cx : Ex = true and Cx ∈/ S} Ci ← Ci ∈ CSet and Ci has the highest credit balance if Ci = null then return S end Set ← {(Ci, APj ) : APj ∈/ S} (Ci, APj ) ← (Ci, APj ) ∈ Set and RSSij is maximum if APj = null then Ei ← false continue end S ← S ∪ {(Ci, APj )} Compute the decoding order in S based on the residual noise tolerance. isSatisfiable ← Satisfiable(S, A) if isSatisfiable = false then Ei ← false S ← S\{(Ci, APj )} end end return S Algorithm 4: Approve: Computes the set of clients that will be approved in this slot

that are in the transmission range of Ci. This computation can be done by reducing this problem to a Max Flow problem as shown in B.1.

94 Keep Silent – Allow neighboring groups to transmit Uplink Uplink Downlink Uplink

Phase I Phase II Send ACKs

......

Time Poll Approve A, B and C

Contention Period using ROP

Figure 4.6: Timeline of data transmission in a large network. The data sent by clients during contention phase are transmitted using the Rapid OFDM Polling (ROP) as discussed in DOMINO to decrease overhead. Phase III is executed in the background over the wired backbone allowing wireless channel to be used for other purposes.

4.5.3 Robustness

In BBN, the first AP decodes one packet while N − 1 packets are nulled using blind beamforming. The second AP decodes the second packet (while the other N −2 packets are nulled using blind beamforming) and so on. Thus, if an AP cannot decode a packet due to inaccuracies in blind beamforming or packet subtraction, then all the following packets that depend on it can also not be decoded. Therefore, to ensure that the first few packets in the decoding order can be decoded with high probability, BBN leverages the high density of the APs. Specifically, BBN increases the decoding robustness of the packets if the number of APs present in a group are more than the minimum required (See Sec. 4.2).

95 Let C1,C2,Ci, ··· ,CN be the order in which the clients are decoded (See Section

4.4.4). Let the number of APs in the group be M and E be the number of extra APs

N 2−N+1 that are present such that E = M − 2 . Recall that exactly N − i packets are nulled

(or aligned) at the AP that decodes the packet from client Ci. If we require one of the extra

APs to independently decode the packet from Ci, then we will need another N − i extra transmitting APs to ensure that packets from Ci+1,Ci+2, ··· ,CN are nulled at this extra

AP. Thus, to decode the packet from Ci at two different APs, we need another N − i +1

APs (including one extra AP for receiving).

BBN increases the decoding robustness as follows: The APs in BBN find the first client Ci in the decoding sequence that satisfies the two requirements: (i) The packet from

Ci is decoded at only one AP; and, (ii) E ≥ N − i +1. Let Ci be the first client in the decoding sequence that satisfies the two constraints. Then, BBN ensures that the packet transmitted by Ci can be independently decoded by two different APs. BBN decreases E by N − i +1 since this is the number of APs required to achieve independent decoding of

Ci. Finally, this process is repeated as long as possible to achieve independent decodings of some packets. Thus, in highly dense networks, BBN leverages the extra APs present in the network to further increase the decoding probability of each packet. Even if extra APs are not available, BBN can restrict the number of clients that transmit simultaneously. This frees up some APs that can be used for increasing the robustness of decoding. Currently, we leave the problem of proactively reducing the number of transmitters to increase the decoding robustness as our future work.

96 4.6 Experiments

4.6.1 Setup

We evaluate BBN in a testbed with 7 USRP N210 nodes. The setup is as follows:

1. Hardware and software setup: Each USRP is equipped with a WBX daughterboard and operates in the 400 MHz band. All nodes are within single collision domain. At the receiver side, we use the GNURadio for signal processing. The decoding is done offline in Matlab. All of the AP nodes are synchronized with the same external clock source. In practice, SourceSync [75] can be used to synchronize the transmitting APs to a nanosecond level accuracy. 2. OFDM and modulation setup: We use a 512 FFT system, with 200 subcarriers used for data transmitting. The cyclic prefix length is set to 128. Unless otherwise mentioned,

Binary Phase Shift Keying (BPSK) is used as the modulation scheme.

Apart from implementing BBN, we also implemented Omniscient TDMA that utilizes a central server. This server is aware of (i) packet queue at different clients; and, (ii) the channel between all clients and all APs. Omni-TDMA schedules the three different clients in a round-robin fashion with each client transmitting to the AP to which it has the best channel.

4.6.2 Micro-Benchmarks

Many works have shown the effectiveness of beamforming [39, 74]. Since our blind- beamforming and nulling involves transmitting unknown samples, its effectiveness and accuracy is unclear. In this section, we evaluate the performance of blind-beamforming and nulling using the signal to interference and noise ratio (SINR). In the following exper- iments, 3 clients and 4 APs were deployed in our testbed as shown in Fig. 4.1.

97 Blind-beamforming and Nulling Effects

First of all, we study the blind-beamforming and nulling effect as described in Sec- tion 4.4. Since there are a total of 12 links between all APs and clients, it is difficult to control the SNR of every link. Instead, we place the clients and APs randomly in our testbed and record the actual SNRs. We repeat the experiment 20 times for each of the 50 randomly chosen topologies. Over various topologies, the SNR between clients and APs varied from 6 dB to 35 dB. We compute the final interference to noise ratio (INR) of packet x1 when it is decoded by AP1. The INR distribution is shown in Fig. 4.7(a). The median of the INR is 0.7 dB, which is just slightly above the noise floor, and the 90th percentile INR

is 3.7 dB. This indicates that residual interference from blind-beamforming and nulling is

relatively small and demonstrates the practicality of BBN.

The INR distribution in Fig. 4.7(a) shows that it could be as high as 10 dB, which is a large value compared with typical SNR values, e.g. 20 dB. We look deeper into the

INR results and present it in an another way in Fig. 4.7(b). The y-axis is the final INR of packet x1 at AP1. The x-axis is the range of signal to interference ratio (SIR) in dB that x1 experiences in the first slot across all of the APs. The smaller the value on the x-axis is, the higher is the amount of interference to be cancelled in the second slot. This figure shows that as the SIR increases, the final INR decreases. When the first slot SIR is larger than -12 dB, the median of the that is 0.6 dB and the 90th percentile is 2.7 dB (Fig. 4.7(b)). Based on this result, we can enable BBN when the SIR value is larger than a threshold and fall back to the default IEEE 802.11 scheme when the SIR value is small. We leave the study of computing the exact threshold value as future work.

98 1

0.8 (3.7 , 0.9) (5.9 , 0.9) 1 client 3 in BBN (0.7 , 0.5) client 2 in BBN client 1 in BBN 0.6 (1.8 , 0.5) 10 0.8 total throughput

of BBN

(dB) CDF CDF 8 1 total throughput 0.4 0.6 of TDMA BBN 6 1.48X

BBN w/o sampling CDF 4 0.4 0.2 offset correction

2

0.2 0 0 0 5 10 15

Final INR of packet x −2 The final INR of packet x (dB) (−14,−12)(−10,−8) (−6,−4) (−2,0) (2,4) 0 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 SIR of packet x in the first slot (dB) 1 Throughput (Mbps) (a) Final INR distribution of (b) Final INR of the packet (c) Throughput of BBN x at the first decoding AP. 1 at the first decoding AP. compared to Omniscient The median of the INR is 0.7 x1 TDMA. dB and the 90th percentile INR is 3.7 dB.

Figure 4.7: Experiment results collected over USRP testbed.

Sampling Offset

As discussed in Section 4.4.3, there is sampling offset between the samples received by AP1 from phase I and phase II. To study the effect of the sampling offset, we turn off the sampling offset correction in BBN and compute the residual interference to noise ratio for x1. The result shown in Fig. 4.7(a) shows that without sampling offset correction, the median INR increases by 1.1 dB and the 90th percentile increases by 2.2 dB. This demon-

strates that the sampling offset correction done in BBN reduces the residual interference.

4.6.3 Throughput

In this section, we study the throughput performance of BBN. The throughput of each client in BBN is recorded and compared with that of omniscient TDMA. A total of 20000 packets are transmitted by each client across different topologies. Fig. 4.7(c) shows that on an average, BBN provides a throughput gain of 1.48× compared with omniscient TDMA.

The figure also shows the throughput of client 1 is higher than that of clients 2 and 3. This

99 100 3000 1 BBN (With Robustness) BBN (Without Robustness) BBN 80 Omni-TDMA IEEE 802.11 2000 60 BBN Omni-TDMA 0.75 40 IEEE 802.11 1000 20 Jain’s Fariness Index Throughput (in Mbps) 0

0 0.5 Percentage of packets decoded 0 600 1200 1800 0 600 1200 1800 0 600 1200 1800 Number of APs Number of APs Number of APs (c) Percentage of packets (a) Total Throughput (b) Jain’s Fairness Index decoded

Figure 4.8: Trace-Driven Simulation Results for Multi-Collision Domain

is because x2 and x3 are decoded only if x1 is decoded. Further, even if x1 is decoded, x2 and x3 may not be decoded due to the residual interference from subtraction.

4.7 Trace-Driven Simulation

This section explains the setup and the results from the trace-driven simulations.

4.7.1 Simulation Setup

Apart from implementing BBN, we also implemented two other algorithms: (i) Om- niscient TDMA algorithm: Described before in Section 4.6. However, this time similar to BBN, Omniscient TDMA also uses a credit-based system where a client has high credit value if it has not transmitted for a long time. In each slot, it schedules an maximum inde- pendent set of “client to AP” links (where weight of link = credit of the client × throughput of the client when using that link). The physical layer rate of a link is chosen by pick- ing the highest data rate that can be decoded by the AP.; and, (ii) IEEE 802.11 (without

RTS/CTS). To evaluate the gain provided by BBN irrespective of the downlink algorithm

100 used, only the uplink traffic from clients to APs was generated. Various traces were in-

corporated into the simulation: (i) Noise due to Blind Beamforming and Nulling: The

simulator incorporated noise arising due to imperfect nulling. For this, we used the traces

collected from our experiments (See Fig. 4.7(a)). (ii) Noise due to subtraction: When an

AP subtracts a packet, it has to recreate its samples and correct for various offsets such as sampling offset and frequency offset. An imperfect correction leads to imperfect subtrac- tion resulting in residual noise. The simulator incorporated this residual noise using the traces collected by us in experiments. (iii) Path Loss between clients and APs: Incorpo- rated from the traces [91]. (iv) Path Loss between APs: Incorporated from the traces [91].

In this section, we study the behavior of BBN in a large EWLAN (Enterprise Wireless

LAN) that spans over multiple collision domains (e.g., the campus of a university). Our simulator first randomly deploys 1000 clients in a field of size 500m × 500m. APs are also deployed randomly and the count of APs is varied. In this setup, different devices may belong to different groups as described in Section 4.5. The overhead of different protocols was taken into account during the simulation. Also, APs in BBN used extra APs to further increase the decoding robustness as described in Section 4.5.3. Finally, clients in BBN and

IEEE 802.11 used the Auto Rate Fallback (ARF) algorithm to determine the physical layer data rate.

4.7.2 Results

Next, we describe the results from our trace-driven simulations.

1. Total Throughput across all clients: Throughput increases for all algorithms as they leverage the increase in the physical layer data rate (See Fig. 4.8(a)). For 802.11, increase is

101 not substantial since high number of collisions (due to hidden terminals) reduces the num-

ber of successful transmissions. With increase in number of APs, the throughput in BBN

increases because of two reasons: (i) Higher AP density implies more APs are present in

each group, resulting in higher throughput since more clients can be supported at the same

time; and, (ii) Higher data rate at clients due to higher AP density. As the density of the

APs increase, throughput in BBN increases substantially compared to TDMA. When the

number of APs is 2000, each client is in the range of an average of 76 APs. At that density,

BBN throughput is 5.6× compared to TDMA, and 52.4× compared to IEEE 802.11. This is lower than the expected gain since in BBN, clients use ARF to adjust their physical layer data rate while clients in Omniscient TDMA transmit at the best possible data rate. 2. Fairness: Fig. 4.8(b) shows the variation in Jain’s fairness index with variation in

number of APs. IEEE 802.11 has very low fairness since a client may get starved if it is

in the range of multiple APs. The fairness index of BBN is higher than other algorithms

since BBN allows all clients to transmit. BBN has higher fairness than TDMA since BBN

performs precoding over transmissions from all clients. Thus, even the clients that are far

away from all APs may experience high throughput due to beamforming from multiple

helper APs. 3. Decoding probability: As the density of the network becomes higher, the length of

the decoding chain increases. Thus, a decoding failure on one packet implies a decoding

failure on all the other packets that depend on it. Fig. 4.8(c) shows the percentage of pack-

ets decoded successfully decreases with increase in density. However, still the throughput

in BBN increases (See Fig. 4.8(a)) since higher density enables multiple clients in APs to

transmit successfully. Further, with high density of APs, BBN can use robustness tech-

niques discussed in Section 4.5.3 to increase the decoding probability. Fig. 4.8(c) also

102 shows the decoding probability when BBN does not use robustness techniques described

before. With the increase in number of APs, there is a higher chance that BBN can leverage

those APs to improve robustness. Thus, with increasing density, the improvement provided

by robustness further increase.

4.8 Discussion

In this section, we discuss some further modifications that make BBN more practical.

Reducing overhead of channel estimation: To compute the precoding vectors, the

APs in BBN require the knowledge of channel between all clients and APs as well as the

channel between all APs. The problem of computing the channel from clients to APs has

been well studied in the context of MIMO networks [39,103]. To compute the channel val-

ues, we are planning to use PN sequences to estimatethe channel from multiple transmitters

simultaneously [59].

Overhead on the backbone: In BBN when decoding N uplink packets, the APs need

(N−1)×(N−2) to exchange 2 data packets. This is in contrast with [74] that requires exchange of (N −1)×(N) data packets when performing joint beamforming for the downlink traffic.

In addition, APs in BBN also need to relatively smaller exchange control packets related to channel state information, scheduling etc. Currently, we are exploring techniques to adjust the number of participating clients based on how much overhead can be tolerated on the wired backbone.

APs with multiple antennas: If the APs are equipped with multipleantennas, BBN can leverage them to reduce the number of required APs. Specifically, if each AP is equipped with K antennas, then to receive N uplink packets simultaneously, BBN would require

N ′2−N ′+2 ′ only 2K APs where N = N − (K − 1), a reduction by a factor of more than K.

103 4.9 Related Work

Although BBN builds on several prior work, it differs from them in various ways.

Backbone usage: The idea of using the wired backbone to increase wireless through- put is not new. In MegaMIMO [74], multiple APs cooperatively precode the transmissions such that each client receives only the packets intended for it while the other transmis- sions are canceled out. However, MegaMIMO requires that transmitters exchange packets among themselves and thus, it works only for the downlink transmissions. On the other hand, BBN improves the throughput for the uplink traffic. Also, in contrast to MegaMIMO and OpenRF [56], transmitters in BBN jointly perform nulling without knowing the actual contents of the packets.

A recently proposed protocol Symphony [17] also focuses on uplink traffic. However, in contrast to BBN, Symphony improves the network throughput only when the APs are in different collision domains. In Epicenter [41], authors propose that APs should exchange coarse representations of symbols to decode corrupted bits. Similarly, authors in [99] also propose that APs exchange bits or raw samples on the backbone to facilitate packet de- coding. In all these algorithms, the APs cooperate to decode the same packet whereas in

BBN, APs encourage transmitters to collide and then cooperate to decode multiple packets simultaneously without exchanging the raw samples.

Finally, [16] also proposed using the backbone to improve the uplink throughput. How-

ever, in contrast to [16], BBN provides the first implementation of blind beamforming and

nulling. Further, BBN works in multi-collision domains, uses robustness techniques to

increase decoding probability, and, requires N fewer APs compared to [16].

Interference Alignment: Previously, researchers (see [48] and references therein) have

used interference alignment to improve the capacity of wireless networks. However, unlike

104 BBN, they either require APs to exchange samples over the backbone [12], work only for

the downlink traffic [92], assume presence of significant number of clients [69], require

multiple antennas at transmitters or receivers [39], require the antennas to be physically

moved [9] to a certain point, require the channel to change from one slot to another [24],

precode over exponential number of time slots [24], or provide limited throughput gain [9,

39]. These assumptions are not practical in mobile networks since if the client is stationary,

the channel may not change [95] from one packet to another. In contrast to the previous

works, BBN works even if the channel stays stationary.

Wireless Relays: Researchers [55,78] have also looked at the problem of using special relay nodes to assist in high speed communication between specific pairs of source and destination nodes. In contrast, the focus of BBN is to leverage the high density of APs and the wired backbone to carefully select the set of destination APs, determine which

AP decodes which packet, and to use the wired backbone to migrate all the complexity away from the clients. Further, with previous works, it is possible that the destination AP is unable to decode a packet due to low SNR. However, in BBN, APs leverage the high density of APs to increase robustness (See Sec. 4.5.3).

4.10 Conclusions

In this chapter, we discussed BBN, a blind beamforming and nulling scheme that lever- ages the high density of access points to enable multiple mobile devices to transmit si- multaneously. Feasibility of BBN was verified on a USRP testbed. Measurements show that BBN achieves a throughput gain of 1.48× over omniscient TDMA. Using trace-driven

simulations, we showed that in dense wireless LANs, BBN provides a throughput of up to

5.6× compared to omniscient TDMA.

105 Chapter 5: BASIC: Backbone-Assisted Successive Interference Cancellation

5.1 Introduction

To meet the rapidly increasing demand for wireless capacity, we need to go beyond traditional strategies that prohibit interfering transmissions from being simultaneously ac- tive. When multiple interfering transmissions are simultaneously active, proactive man- agement of interference becomes essential for successful decoding of these packets. Trans- mission strategies involving multiple interfering users have been studied in Information

Theory [7,25,48,94]. Some of these ideas have been implemented and evaluated using real systems [39,74,87,88]. However, such techniques need significant coordination among the transmitting nodes.

Techniques proposed in Information Theory, such as Network MIMO [7], need the transmitting nodes to not only coordinate their transmissions, but also exchange data with each other before transmitting. Another technique called interference alignment [25,48,60] does not need data to be exchanged between the transmitters, but needs tight time and fre- quency synchronization. Such requirements are easier to meet for downlink transmissions from different APs. Specifically, APs belonging to the same enterprise network have an

106 Ethernet backbone that allows them to exchange data packets before they transmit simul-

taneously [74]. Further, they can also use the wireless medium [15,74] or use power-

lines [104] to satisfy the synchronization requirements. However, wireless clients do not

have these luxuries.

To improve uplink communication efficiency, coordinated multipoint (CoMP) [47] has been proposed for LTE networks. In CoMP, base stations exchange received samples with each other to decode the uplink packets in a MIMO fashion. However, base stations in LTE networks are connected through dedicated high-speed fiber, which provides much higher capacity than Ethernet backhaul in typical enterprise networks. Researchers have shown that exchanging raw samples can lead to unreasonable traffic on the Ethernet [39,41,107].

BBN, discussed in Chapter 4, removes the synchronization requirement for clients, which is a big step in the right direction, however, it still needs the APs to maintain sample- level synchronization. Such a requirement is still a hindrance in rapid deployment of this technology as it is non-trivial to meet such synchronization requirements. It also requires a large number of APs (O(N 2) to support N uplink transmissions) which puts an additional requirement on the network density. So, a pressing question is - Can we enable uplink

multi-user transmissions in practical systems, i.e., without requiring tight synchronization

among APs or clients, without overwhelming the backbone network, and without requiring

a high AP density?

A promising technique from Information Theory called Successive Interference Can- cellation (SIC) supports decoding of multiple transmissions at a single receiver. It does that by first decoding the strongest signal and treating the rest as noise. It cancels (or subtracts) this decoded signal from the ensemble and continues to successively decode the remaining packets. But the achievable gain has been shown to be limited [85]. Our own evaluations

107 (Figure 5.9(a)) show that SIC has no gains over omniscient TDMA in more than 50% of the cases and only 20% gain in the remaining cases. Our analysis in Section 5.2 also indi- cates that the theoretical throughput gain of SIC over TDMA is limited. The main reason for limited gains is that a single receiver is unable to fully capitalize on the diversity of transmitters. However, in a realistic environment, the added dimension of diversity offered by multiple receivers (or base-stations) can be cleverly leveraged to distributedly apply the interference cancellation technique. Based on this observation, we present BASIC, a novel lightweight multi-user uplink transmission technique that does not require tight synchro- nization and does not impose any restrictions on the AP density. BASIC exploits the inher- ent receiver diversity and takes advantages of the Ethernet backbone connection between

APs which allows them to exchange decoded packets with each other. It decodes multiple simultaneously transmitted uplink packets according to a chosen sequence but in contrast to SIC, each of those packets can be decoded at a different AP. Each decoded packet is forwarded to the succeeding APs where its interference can be removed so that desired packets can be decoded. Thus, a group of APs collaborate to decode a group of simulta- neously transmitted uplink packets while leveraging the backbone. A greedy heuristic is proposed to determine the transmission and decoding plan.

The design of BASIC is particularly challenging due to the following reasons. 1) The problem of determining the transmitting clients and decoding plan is combinatorial in na- ture. For N clients and M APs, we can choose any subset of the N clients to transmit and the packet from each client can be decoded at any of the M APs. So there are N N M i Pi=1 i  combinations that are possible. 2) Packet subtraction across multiple nodes is particularly difficult when time-synchronization is not perfect.

108 .ackbone !t !t 1 2 s s 21 12 s s 11 22

/ / 1 2

Figure 5.1: A 2×2 network with 2 clients and 2 APs which are connected through the backbone network. The weight (sij) of the dotted line is the received signal strength (RSS) at APj for transmission from Ci in Watts.

To show how BASIC works, we use Figure 5.1 as an example. Assume each client has a packet to send to the associated AP. BASIC allows both clients to transmit simultane- ously. To achieve correct decoding of both packets, the data rate for C1 is carefully selected such that the packet can be decoded with a signal-to-noise-ratio (SNR) 6 of s11 . With this s21

requirement, AP1 could receive the packet sent by C1 correctly with the interference from

C2. The decoded packet is then delivered to AP2 over the backbone. AP2 then subtracts this packet from the received samples and decodes the packet from C2 without any inter- ference. To quantify the gains of BASIC, we choose s11 and s22 to be 20 dB higher than the noise floor, while picking s12 and s21 to be 10 dB higher than the noise floor. For this example, TDMA schedules C1 and C2 alternately with 20 dB SNR. SIC has no gain over

TDMA. It allows C1 to transmit with 10 dB SINR to AP1. After decoding the packet from

C1, we can subtract its interference from the received samples and decode C2’s packet with

10 dB SNR. After decoding C1, we can subtract its interference from the received samples and decode C2 in AP1. For BASIC, it also allows C1 to transmit with 10 dB SINR to AP1.

6For simplicity, the channel noise is ignored here.

109 The decoded packet is forwarded to AP2. After interference cancellation, packets from C2

20+10 can be decoded with a SNR of 20 dB at AP2. So BASIC achieves 20 = 1.5 times the throughput of both TDMA and SIC.

This chapter makes the following contributions:

• We develop a new uplink packet transmission and decoding strategy that does not

require any devices to be synchronized.

• We present new techniques to enable packet subtraction across multiple unsynchro-

nized APs using fine grained frequency estimation and phase error correction.

• We evaluate our solution using a 20 node USRP testbed.

• Our trace-driven simulations show up to 4.8× gain in throughput with similar flow

fairness.

5.2 Motivation: Gains from Exploiting Diversity

This section explores the potential gain of BASIC and it gives intuition as to why such gain is possible. To distinguish the version of BASIC in this section from the one in our

final implementation, we denoted it as BASIC-OPT. It differs from BASIC in two key aspects: i) it considers all possible decoding orders to identify the best; and, ii) it assumes that packets can be encoded and modulated to exactly achieve any channel capacity. Our analytical results indicate that the throughput of BASIC-OPT over TDMA keeps increasing with the number of AP-client pairs and reaches a median of 2.6× for 4×4 networks.

110 5.2.1 An Example

SIC has been shown to improve the throughput [44] in many scenarios. However, if the data rate of each link is capacity achieving, the gain of SIC is marginal [85]. BASIC takes advantage of the diversity in SNR of different clients to different APs. We show how planning the decoding process at multiple nodes can lead to significant gains over traditional SIC at a single AP. In this section, we analyze the network capacity of different schemes with a simple network comprised of 2 APs and 2 clients (2×2 network), as shown in Figure 5.1.

Network Capacity for TDMA: In TDMA, clients C1 and C2 transmit alternately. As-

suming each client transmits for equal durations, the capacity is:

B s11 s12 CTDMA = log2(1 + max{ , }) (5.1) 2 n1 n2 B s21 s22 + log2(1 + max{ , }) (5.2) 2 n1 n2

where n1 and n2 are noise power levels at AP1 and AP2, and B is the channel bandwidth.

Network Capacity of SIC: Each AP performs SIC independently and the AP with

the highest capacity defines the network capacity. Suppose AP1 first decodes C1’s packet

followed by C2’s packet. At AP1, the capacity for C1 is:

s11 B log2(1 + ). (5.3) s21 + n1

C1’s packet is then subtracted from the combined samples at AP1. So, the capacity from

C2 to AP1 is:

s21 B log2(1 + ). (5.4) n1

So, the sum capacity at AP1 is:

s11 s21 CAP1 = B log2(1 + )+ B log2(1 + ) (5.5) s21 + n1 n1 111 s11 + s21 = B log2(1 + ). (5.6) n1

It can be shown that the sum capacity remains unchanged if AP1 decodes the packets in the other order (C2’s packet followed by C1’s packet). Similarly, the sum capacity at AP2 is:

s12 + s22 CAP2 = B log2(1 + ). (5.7) n2

So, the network capacity for SIC is

CSIC = max{CAP1 ,CAP2 }. (5.8)

Network Capacity for BASIC-OPT: Both clients transmit simultaneously in BASIC-

OPT. There are multiple decoding choices. It depends on where each packet is decoded

and in what order they are decoded. For the 2×2 network, there are four possible decoding

orders. When AP1 first decodes C1’s packet, transmits the decoded packet to AP2 over the

backbone, and then AP2 subtracts C1’s packet and decodes C2’s packet, the capacity is:

s11 s22 C11 = B log2(1 + )+ B log2(1 + ) (5.9) s21 + n1 n2

When AP1 first decodes C2’s packet, transmits the decoded packet to AP2, and then AP2

decodes the other packet, the capacity is:

s21 s12 C12 = B log2(1 + )+ B log2(1 + ) (5.10) s11 + n1 n2

Similarly, when AP2 decodes first, the corresponding two capacity terms for the two pos-

sible options are as follows:

s22 s11 C21 = B log2(1 + )+ B log2(1 + ) (5.11) s12 + n2 n1

s12 s21 C22 = B log2(1 + )+ B log2(1 + ) (5.12) s22 + n2 n1

112 BASIC-OPT includes SIC as a special case of decoding. So, the network capacity is:

CBASIC−OP T = max{C11,C12,C21,C22,CSIC}. (5.13)

With the above analysis, we can make the following claim: CBASIC−OP T ≥ CSIC >

CTDMA. The first inequality is trivial. The second inequality can be derived as follows:

s11 s21 s12 s22 CSIC = B log2(1 + max{ + , + }) n1 n1 n2 n2 B s11 s12 > log2(1 + max{ , }) 2 n1 n2 B s21 s22 + log2(1 + max{ , }) 2 n1 n2

= CTDMA. (5.14)

The improvement of CBASIC−OP T and CSIC over CTDMA depends on the RSS of the links. For example, if we pick s11 = 100 pW (-70 dBm), s22 =1 nW (-60 dBm), s12 = s21

= 10 pW (-80 dBm), n1 = n2 =1 pW (-90 dBm), we have CTDMA =8.3B, CSIC =9.9B, and CBASIC−OP T = C21 = 13.2B. BASIC-OPT performs 59% better than TDMA. As s12 decreases, the first term in C21 increases and the improvement approaches 100%. Our trace-driven results in the next section show that the capacity gain of BASIC-OPT over

TDMA increases almost linearly with the network size.

5.2.2 Trace-driven Analysis

To understand the advantage of utilizing receiver diversity in practice, we collect the

RSS values between APs and clients in a large enterprise network. We use the RSS values to generate a number of network scenarios and the capacity formulations from Section 5.2.1 are used to calculate the capacity of different schemes.

Experiment Setup: The RSS trace collection is conducted on the first floor of a build- ing on campus. We placed a laptop equipped with an Intel Centrino Advanced-N 6205

113 100 100

80 80

60 1.6X 60 2.6X 40 CDF (%) Same AP,

CDF (%) 40 BASIC-OPT Different AP, 20 BASIC-OPT 3X3, BASIC-OPT 20 Same AP, SIC 3X3, SIC Different AP, SIC 4X4, BASIC-OPT 0 4X4, SIC 0 20 40 60 80 100 0 Throughput Gain over TDMA (%) 0 50 100 150 200 250 Throughput Gain over TDMA (%) (a) Throughput gain over TDMA in 2×2 (b) Throughput gain over TDMA in 3×3 and networks. For BASIC-OPT, the median gain of 4×4 networks. For BASIC-OPT, the median 2 clients associated with the same AP is 41%, gain of 3×3 networks is 110% and the median and the median gain for different association is gain of 4×4 networks is 156%. For SIC, the 62%. For SIC, the median gain is about 29% in median gain is 42% and 54%, respectively. both cases.

Figure 5.2: The throughput gain of BASIC-OPT and SIC over TDMA with ideal data rates.

Wi-Fi adapter at 60 different locations (offices, labs and classrooms) in the building. Then we logged the RSS values from the APs provided by the university by identifying the “ES-

SID” field in the MAC header.

Results: There are a total of 103 APs detected during the trace collection process. The

SNR of all AP-client pairs vary between 2 dB and 63 dB with a mean value of 15 dB. We generate a number of 2×2 networks from the collected trace in the following way. Two

locations are selected first. Then we pick two APs that are observed at both locations.

A total of 9077 such networks are created. Figure 5.2(a) presents the capacity gain over

TDMA in these 2×2 networks. We have partitioned the scenarios into two categories

termed “Same AP” and “Different AP” depending on whether the two clients select the

same AP or different APs for association based on the RSS values. Note that when the

114 clients are associated with different APs, there is more RSS diversity, which favors BASIC.

When both clients are associated with the same AP, the median gain for BASIC-OPT is

41%, whereas the median gain is 62% when the clients are associated with different APs.

These results show the advantage of leveraging diversity in a real deployment. On the other

hand SIC achieves around 29% median gain over TDMA and shows almost no difference

in throughput gain when the clients are associated with different APs. For 3×3 and 4×4 networks created in the same way as for the 2×2 networks, the throughput gains are shown in Figure 5.2(b). The median throughput gain for BASIC-OPT increases almost linearly, reaching 110% and 156% respectively, indicating that the performance improvement of

BASIC-OPT scales with the network size. The throughput gain of SIC increases slowly with 42% for 3×3 and 54% for 4×4. The reason behind this is that although new clients

contribute to the total SNR at an AP linearly, the capacity gain for SIC is in a logarithmic

relationship with this linearly increasing SNR.

5.3 Challenges in Practice

The analysis in the previous section showed that significant gains can be achieved us- ing BASIC. However, there are multiple challenges in implementing BASIC in reality for achieving the best performance.

• Decoding Order and Data Rate Selection: As shown in the Section 5.2.1, there

are four different decoding orders for BASIC and each order may have a different

capacity. The number of choices for the decoding order increases exponentially with

the network size, making it challenging to compute the best order. Also, we assume

the existence of an ideal transmission scheme that achieves capacity. However, in

existing 802.11a/g standards, there are only 8 different data rates and none of them

115 is capacity achieving in reality. We need a computationally inexpensive algorithm to

determine the decoding order and data rate for each client while achieving close to

optimal throughput performance.

• Interference Cancellation: In order to remove the interference, we need to recon-

struct the samples from the decoded packets. It requires compensations for sampling

and frequency offset [38]. In existing interference cancellation techniques, the packet

decoding and interference cancellation happens at the same node [38,44]. So the de-

coder is able to keep track of the frequency offset estimation error using pilot tones.

However, the interfering packets in BASIC may have been decoded at a different

AP. Since the APs are not frequency synchronized, we cannot get rid of the residual

frequency offset using the decoding information from another AP.

• Fairness Issues: Throughput optimized schemes tend to make some clients suffer

from starvation. A tradeoff between fairness and throughput needs to be considered

for the implementation of BASIC in practice.

5.4 The Design of BASIC

In this section, we discuss the design details of BASIC. To decode all packets correctly, we need to determine the decoding order of the packets and select proper data rates for each packet such that the interference can be tolerated. The analysis in Section 5.2.1 shows that the key to solving this problem is the accurate estimation of RSS values between all APs and clients.

Here we define some terms usedin BASIC. Let the client set be Client = {Cm : m ∈ [1 , M ]}, the AP set be AP = {APn : n ∈ [1, N]}, and the RSS values between all AP-client pairs be RSS = {smn : RSS from Cm to APn }. Denote all data rates and the minimum SNR

116 100 100

90 90

80 80

70 70

60 60

50 50

CDF (%) 40 CDF (%) 40 Same AP, BASIC 3X3, BASIC 30 Different AP, BASIC 30 3X3, SIC Same AP, SIC 20 20 4X4, BASIC Different AP, SIC 4X4, SIC 10 10

0 0 0 20 40 60 80 100 0 50 100 150 200 250 300 Throughput Gain over TDMA (%) Throughput Gain over TDMA (%) (a) 2×2 networks. (b) 3×3 and 4×4 networks.

Figure 5.3: The throughput gain of BASIC and SIC over TDMA for different network sizes with discrete data rates.

required for each data rate as Datarates = {(SNRl,dl): l ∈ [1, L]}. Define the follow-

ing ordered sequence as a candidate for BASIC: {(Ci1 , APj1 ,dl1 ), ··· , (CiT , APjT ,dlT )},

where T is cardinality of the selected subset of clients to participate in the concurrent trans-

mission. Each Cik in the candidate is distinct while APjk and dlk do not necessarily need

to be unique. For a candidate, the packet from Cik is to be decoded at APjk with data rate

dlk in presence of noise njk and interference from clients decoded after Cik , i.e.,

sikjk T ≥ SNRlk , ∀k ∈ [1, T ]. (5.15) si j + nj Pt=k+1 t k k The objective of BASIC is to find out the candidate that maximizes the sum throughput

T dl . Pt=1 t Before moving to the design details of BASIC, we want to figure out one question. Will the throughput improvement diminish with practical discrete data rates? To answer this question, we re-evaluate the network throughput of the trace collected in Section 5.2.2 with the available data rates from 802.11a/g for BASIC, SIC and TDMA. The SNR requirement for each data rate is selected according to our experiment results in Section 5.6.1. The

117 /hannel 5ata wate 5ata 5ecoding 9stimation {election Çransmission trocess

!t1 toll Çrigger

/1 tre tayload

/2 tre tayload Çime

Figure 5.4: The timeline of the transmissions of APs and clients in the network shown in Figure 5.1.

throughput gain of BASIC and SIC over TDMA are shown in Figure 5.3. Although BASIC does not show much gain over TDMA when the clients are associated with the same AP in

2×2 networks, it achieves 1.5× the throughput of TDMA when the clients are associated with different APs. The throughput gain of BASIC keeps increasing with the network size, obtaining 2× and 2.5× the throughput of TDMA in 3×3 and 4×4 networks, respectively.

In comparison with the results from Section 5.2.2 using ideal data rates the throughput gain with discrete data rates for BASIC is lower only by a small amount. The throughput gain of SIC with discrete data rates, on the other hand, experiences a sharp drop compared with the gain with ideal data rates, reconfirming the results shown in [85]. These figures indicate that unlike SIC, which suffers from the restrictive SNR requirement of discrete data rates, BASIC is able to take advantage of the receiver diversity and achieves promising throughput improvement over TDMA.

5.4.1 BASIC Overview

In this part, we briefly discuss how the APs estimate the RSS values from the clients accurately and how the APs decode the packets from the clients. To coordinate the APs

118 and clients, one AP is elected as the Head AP. It sends commands to the clients through

the wireless channel and interacts with other APs using the backbone network. The multi-

phase protocol works as follows. The Head AP first requires each of the clients to send a

preamble in the specified order to estimate channel RSS values. These RSS values are then

used to calculate the best candidate for BASIC. The APs then inform the clients to transmit

with the given data rates according to the best candidate. After the clients finish sending,

the APs decode all of the packets in the order specified in the candidate. In the following

sections, we explain each phase in detail. We still use the 2×2 network in Figure 5.1 as an

example. And assume AP1 is the Head AP. The nodes in BASIC transmit according to the timeline shown in Figure 5.4.

5.4.2 Channel Estimation Phase

The uplink transmission slot begins with AP1 broadcasting a Poll message, which contains the IDs of an ordered list of clients ({C1,C2} in this example). This ordered list informs the selected clients to transmit a preamble sequentially in that order. In this exam- ple, C1 sends a preamble followed by C2. When there are too many clients in the network,

BASIC cannot schedule all of them to transmit simultaneously because the interference

level will be too high for reliable transmissions. In the next section, we will discuss the

process for selecting the clients to transmit in a given slot. Here we assume C1,C2 are

selected. When C1 and C2 receive the Poll message, they send back a standard 802.11a

PHY preamble in the assigned time slot. Since the clients and APs in BASIC are not nec-

essarily synchronized and the propagation delays between all AP and client pairs vary, a

small guard interval of 2µs is inserted between these preambles to protect the transmis-

sions. These preambles allow the APs that overhear them to perform the following two

119 operations: i) Estimate the RSS values on all subcarriers from each client which are used

to calculate the decoding order and data rate for the clients; and ii) Estimate the channel

properties, including frequency offset and sampling offset, from each client for interference

cancellation in the Decoding Phase.

5.4.3 Data Rate Selection Phase

In this phase, AP1 collects the RSS values from AP2 and uses a data rate selection algorithm to find the best candidate for BASIC. Our analysis in Section 5.2.1 selects the optimal decoding order for BASIC. However, solving the decoding order problem is a non-trivial combinatorial problem. In this section, we propose a greedy polynomial-time algorithm for data rate selection that is described in Algorithm 5.

In order to provide each client a fair transmission opportunity, we propose a credit based client selection algorithm. For every client we maintain a weight which is equal to the number of time slots that is allocated to the client for transmission and we choose the set of clients to transmit according to their credits.

The basic idea of Algorithm5is toselect a client that maximizes the achievable through- put at each step. Since there is a close relationship between SINR and data rate, the algo- rithm picks the client with the maximum SINR across all the APs. The algorithm consists of two loops. In the outer loop, a new client with the highest credit priority is added to the transmitting set (lines 6-7). Then it calculates the sum RSS at each AP (lines 10-11). In the inner loop, it appends the client with the maximum SINR to the client decoding order list (lines 13-20). If the data rate for a client is 0, it indicates that the current set of clients should not transmit concurrently (lines 16-19). The total RSS at each AP is updated since

120 Input: (a) a client index set, Client = [1,M] and their Credits = {credit1, ··· , creditM }, such that crediti ≤ creditj , ∀i < j; (b) an AP index set, AP = [1,N]; (c) the RSS between all AP-client pairs; (d) all possible data rates available in increasing order and minimum SNR required: Datarates = {(SNRl, dl): l ∈ [1,L]}; and, (e) the noise level array at each AP, N = {ni : i ∈ [1,n]} (f) 90% percentile value of residual interference from Section 5.7, Residual = {residuali : i ∈ [minsnr, maxsnr]}. Output: A candidate for BASIC:{(Ci, APj , dk): Ci ∈ Client, APj ∈ AP and dk ∈ Datarates}. candidate ← []; clientSet ← []; maxTp ← 0 ; for cNew ∈ Client do tmpCset ← tmpCset ∪{cNew} ; tmpCand ← []; tmpTp ← 0 ; for ap ∈ AP do S[ap] ← RSS[c][ap]+ n[ap]; c∈tmpCsetP end while tmpCset 6= ∅ do RSS[c][ap] sinr[c][ap] ←{ S[ap]−RSS[c][ap] : c ∈ tmpCset, ap ∈ AP}; maxSinr ← max {sinr[c][ap]}; c∈tmpCset, ap∈AP (newClient, newAp) ← (c, ap): sinr[c][ap]= maxSinr; rate ← Datarates[maxSinr]; if rate == 0 then tmpTp ← 0 ; break; end tmpCand.push back((newClient, newAp, rate)); tmpCset ← tmpCset\newClient; tmpTp ← tmpTp + rate; for ap ∈ AP do S[ap] ← S[ap] − RSS[newClient][ap] +residual[RSS[newClient][ap]/n[ap]]; end end if tmpTpt > maxTp then candidate ← tmpCand; maxTp ← tmpTp; end end return candidate Algorithm 5: Maximum SINR greedy algorithm

BASIC is able to remove the interference of the decoded packets (lines 23-25). The can- didate is updated if the current set of clients can achieve a higher total throughput (lines

26-28). 121 400 MaxSINR (greedy) Exhaustive Search 300

200

100

Throughput (Mbps) 0 10 15 20 25 Number of APs

Figure 5.5: Throughput Comparison of MaxSINR and Exhaustive Search. Network contains 50 clients.

Henceforth this greedy algorithm is known as MaxSINR. We have compared the perfor- mance of MaxSINR and an exhaustive search based algorithm which evaluates all possible subsets of clients for decoding and chooses the best one. Evidently, this exhaustive search based algorithm requires exponential computation time. Figure 5.5 compares the through- put of exhaustive search based algorithm and MaxSINR. The network contains 50 clients and the number of APs are increasing. Throughput of MaxSINR is only 6% worse than the throughput of exhaustive search based algorithm.

5.4.4 Data Transmission Phase

The Head AP AP1 sends a Trigger message to the clients with the data rate information in the beginning of this phase. Then, the clients transmit simultaneously according to the chosen data rates. For the sake of simplicity, we define the transmission time to be of a

fixed duration. The clients can perform packet aggregation and splitting to fill the whole transmission duration. In this phase, all APs store the received samples from the clients.

122 5.4.5 Decoding Phase

In this phase, assume that packet x1 from C1 is decoded first at AP1 correctly. Then

x1 is forwarded to AP2 where it will be subtracted from the received samples. This inter-

ference cancellation process has been studied widely in many previous works [38,39,44].

We refer the reader to the existing literature for various techniques used for interference

cancellation. However, there is a fundamental challenge in the interference cancellation

in BASIC that is different compared to the existing techniques. As discussed in [38,39],

we need to compensate for both frequency and sampling offsets to reconstruct the samples

from the decoded packets. Both of these can be obtained from the preamble in the Chan-

nel Estimation stage. The impact of residual sampling offset on interference cancellation is similar throughout the packet. However, the impact of residual frequency offset keeps increasing with the number of samples in the packet. Although the Wi-Fi preamble in the

Channel Estimation phase provides an estimate for the frequency offset, it is not accurate enough to reconstruct the packet correctly. Assume the residual frequency offset is ∆f. Let the actual received samples for the interference packet be Sactual = {a1, ··· , an}, where n is the total number of samples. Due to frequency offset, the samples reconstructed using the

2π∆fj 2π∆fnj preamble estimation are Srebuilt = {e B a1, ··· , e B an}, where B is the bandwidth.

Using Taylor series for exponential function and ignoring the higher orders, the residual

interference strength for the kth sample can be written as follows:

2π∆fkj 2 2π∆fkj 2 r = |(1 − e B )a | ≈| a | , (5.16) k k B k

which is quadratic with the index k.

According to [90], the frequency estimation accuracy using the Wi-Fi preamble could

be as low as 0.1 ppm at 25 dB SNR, which indicates 500 Hz residual frequency offset at

123 the 5 GHz ISM spectrum. For a transmission duration of 1 ms, this inaccuracy leads to a phase shift of π for the last sample of the packet, which is the reverse of the actual sample.

Obviously, the interference cancellation will fail with this estimation. Existing interference cancellation techniques decode and subtract the packet at the same node, which allows them to keep updating the frequency offset estimation. This can not be applied to BASIC because the decoding and cancellation can happen at different APs. In BASIC, we solve the residual frequency offset problem in two steps: fine frequency estimation and phase error correction based on the following observation. The noise level in the correlation value of two similar sample sequences keeps decreasing with the length of the sample set.

Fine Frequency Estimation: The frequency estimation error varies between 0.1 ppm and

1 ppm with Wi-Fi preamble [90] under different SNR settings. To achieve better estimation accuracy even under low SNRs, we use the first 2K samples from Srebuilt to do correlation with the corresponding 2K samples from Sactual. These samples are divided into two parts and the correlation values of both halves are calculated in the following way:

K 2π∆fij 2π∆fj 2π∆fKj e B ai e B (1 − e B ) COR1 = = 2π∆fj , (5.17) X a B i=1 i 1 − e

2K 2π∆fij 2π∆f(K+1)j 2π∆fKj e B ai e B (1 − e B ) COR2 = = 2π∆fj , (5.18) X ai B i=K+1 1 − e

COR2 2π∆fKj 2π∆fK angle( )= angle(e B )= mod 2π, (5.19) COR1 B where the function angle(x) returns the phase of x. Assume K = 2000. When |∆f| <

10000 (2 ppm at the 5 GHz band) with B = 20 MHz bandwidth, angle( COR2 ) = 2π∆fK . COR1 B This gives us a more accurate frequency estimation as

B × angle( COR2 ) ∆fˆ = COR1 , (5.20) 2πK 124 Phase Error Correction: Although the above technique further decreases the frequency

estimation error, a small amount of frequency offset still remains. Since the frequency

offset is represented as phase rotation in the constructed samples, we divide the whole

packet into smaller blocks each with M samples and estimate the accumulated phase error

for each block. Assume the residual frequency offset after the Fine Frequency Estimation

is now ∆f ′. Denote the N th block of the reconstructed samples after correction using ∆fˆ

′ ′ ′ 2π∆f ((N−1)M+1)j 2π∆f NMj B B as Srebuilt = {e a(N−1)M+1, ··· , e aNM }. The correlation between

′ Sactual and Srebuilt is as following: ′ ′ 2π∆f ij 2π∆f Mj NM ′ e B ai 2π∆f ((N−1)M+1)j 1 − e B B ′ COR = = e 2π∆f j . (5.21) X ai B i=(N−1)M+1 1 − e

2π∆f ′M Assume | B | << π after the Fine Frequency Estimation. Then the higher orders in ′ 2π∆f Mj 2π∆f ′((N−1)M+1) B the Taylor Series of e can be ignored. We have angle(COR) ≈ B , which is the accumulated phase error for the N th block.

Our experiment results in Figure 5.7 show that these two steps allow us to keep the residual interference level below 2 dB for the whole packet.

5.4.6 Communication Overhead

The transmissiontime for the Poll and the Trigger messages are both 40 µs in 802.11a/g.

The transition time between transmitting and receiving is 9 µs; the transition time between

the clients preambles is 2 µs; and, each preamble takes 16 µs. So the time overhead of the

coordination is (98+ (16+2) × N) µs, where N is the number of clients. When N =4, the overhead is 170 µs. If the packet duration is set to just 2 ms (1500 Bytes at the lowest

data rate, 6 Mbps), the overhead is as high as 8.5%. In order to amortize this overhead,

we set the packet duration to 10 ms in BASIC so that the overhead is 1.7% for 4 clients

and 2.8% for 10 clients. In the above overhead calculation, the duration for the Data Rate

125 Selection phase is not considered. The reason is that the first three phases can be decoupled

and happen at a different time. Although the Data Rate Selection phase takes almost 200

µs to finish due to delay over the backbone network [88], other devices in the surrounding

could transmit during this phase. Since the channel coherence time with a walking speed is

around 30 ms [96], the RSSs remain almost the same from the time the Channel Estimation

phase finishes to the time when the Data Transmission phase finishes.

5.5 BASIC in Multiple Collision Domains

Enterprise networks are typically deployed over a large area such as an airport, a library,

and a shopping mall. In addition, the Wi-Fi signal also attenuates quickly in the air which

may result in multiple collision domains. So, we need to design a technique to divide a

large network into smaller groups of APs and clients and run the BASIC protocol in those

groups.

We use the dynamic group formation scheme as mentioned in [17] to divide a large

network into smaller groups of nodes. In this scheme a node can be in one of two states:

recovery and idle. In the recovery state, a node is part of a group and is actively transmitting or receiving packets. In the idle state, a node is not part of any group or not transmitting or expecting any reception. An AP, that wants to form a group and does not have any neighbor that is in the recovery state, broadcasts a join message. This AP is referred to as

Group Head. Any node that can hear this message joins the group if: i) it is in idle state; ii) it does not have any neighbor that is in recovery; iii) if the node is an AP and the average round trip delay from this AP to Group Head is not above a threshold. We use the same threshold value as mentioned in [17]. After a node joins the group it rebroadcasts the join

126 message. The BASIC protocol is repeated for multiple time slots within a group so that

every client gets an equal opportunity to transmit.

For each group, if we use the scheme from Section 5.4 and use the same Head AP in every time slot, a client will suffer from starvation if it cannot hear the corresponding

Head AP. To avoid this problem, we use a Virtual Head AP, instead of the Head AP, to transmit the commands through the wireless channel. The Virtual Head AP is selected

dynamically for each slot such that its transmission can be heard by the client that has sent

the least number of bytes. For the first slot we use the Group Head as the Virtual Head

AP. If a client cannot hear the Virtual Head AP, it will not transmit the preamble. The

fairness mechanism in BASIC ensures that this user will get more service in later slots to

compensate for this missed opportunity.

5.6 Experiments

In this section, we first evaluate BASIC using a software-defined radio testbed with 4

USRP N210s equipped with XCVR2450 daughterboards. We then present results from

the ORBIT [6] testbed based on experiments with 20 USRPs. Our experiments are based

on the GNURadio IEEE 802.11 implementation introduced by Bloessl et al. [22]. The

code contains both transmitter and receiver implementations that work with commercial

802.11a/g/p devices. We extended the code by adding proper channel estimation on each

subcarrier and soft-input-soft-output (SISO) decoding module. Although USRP is able to

support 20 MHz bandwidth, processing that many samples at real time overwhelms the

host computers. To enable real time implementation of BASIC, we set the bandwidth to 1

MHz in our experiments.

127 100

80

60 BPSK, 1/2 BPSK, 3/4 QPSK, 1/2 40 QPSK, 3/4 16QAM, 1/2 16QAM, 3/4 20 64QAM, 2/3 64QAM, 3/4 0 Paceket Reception Ratio (%) 0 5 10 15 20 25 30 SNR (dB)

Figure 5.6: The SNR and Packet Reception Ratio of different modulation and coding schemes used in our evaluations.

5.6.1 Microbenchmarks SNR-PRR Relationship

We first test the modified implementation of IEEE 802.11 with two USRPs, one serving as the transmitter while the other as receiver. These two USRPs are placed 3 meters apart in an office environment. Each time, we fix the modulation and encoding scheme and gradually change the transmission gain at the sender to capture the packet reception ratio

(PRR) at different SNR levels. For each transmission gain setting, 200 packets are sent and the packet size is set to 1000 Bytes. The receiver keeps track of the SNR of each packet and logs the total number of correctly and incorrectly decoded packets. The SNR and PRR relationship is shown in Figure 5.6. Because of the use of SISO decoding, these curves showed better performance compared to experiment results in [22,72]. We use the results from this figure to guide our data rate selection in Algorithm 5. Since our decoding process is in a cascaded fashion, an incorrectly decoded packet makes the following packets not decodable. So we pick the SNR of the 99% PRR point as the minimum SNR requirement for each data rate.

128 6 4 2 0 -2 RINR (dB) -4 -6 5 10 15 20 SNR (dB)

Figure 5.7: The residual interference to noise ratio (RINR) under different SNR conditions.

15 25

20 BASIC 10 BASIC_FREQ BASIC BASIC_FREQ 15 BASIC_PHASE BASIC_PHASE BASIC_NONE 5 BASIC_NONE 10

5 0 0

-5 -5

Residual interference (dB) 0 4000 8000 12000 16000 20000 Residual interference (dB) 0 4000 8000 12000 16000 20000 Samples Samples (a) Low SNR (8 dB) (b) High SNR (21 dB)

Figure 5.8: The RINR for different part of the packet under different schemes.

Residual Interference Level

In this section, we study the performance of interference cancellation with the fre- quency offset compensation scheme described in Section 5.4.5. Since it is difficult to quan- tify the residual interference level when it is mixed with signals from another packet, we test the performance with only one transmitter and one receiver. We then subtract the packet transmitted from the received signal using interference cancellation techniques.

129 Figure 5.7 shows the residual interference to noise ratio (RINR) relationship with the

original signal SNR. When the SNR is below 21 dB, the average residual interference

level is less than 1 dB higher than the noise floor. An average of 20 dB cancellation is

achieved when the original SNR is higher than 22 dB. In our data rate selection mechanism

in BASIC, we do not assume perfect interference cancellation. Instead, we adjust the SINR

using the signal level of the subtracted packet according to the result shown in Figure 5.7.

Since the residual frequency offset of each test is different, we evaluate the frequency

offset compensation scheme in BASIC by analyzing results from two tests under differ-

ent SNR settings as shown in Figure 5.9. The RINR level for different samples in the

packet are plotted. BASIC-FREQ only implements the Fine Frequency Estimation while

BASIC-PHASE only uses the Phase Error Correction. BASIC-NONE refers to the can-

cellation scheme with neither Fine Frequency Estimation nor Phase Error Correction. The

figures show that the cancellation performance of BASIC remains flat for the whole packet

under all settings. Since BASIC-FREQ only attempts to achieve better frequency offset,

its performance degrades with the number of samples due to the residual frequency off-

set. For BASIC-PHASE, we set M = 3000, which results in a clear cancellation pattern

every 3000 samples. However, without a fine frequency estimation, the assumption that

2π∆f ′NM angle(COR) ≈ B does not hold. So the lowest RINR for BASIC-PHASE is still higher than BASIC. Without any of the frequency compensation techniques, the RINR of

BASIC-NONE shows a clear pattern due to frequency estimation error and the highest

RINR is even higher than the original signal level when the phase of the rebuilt sample is actually the reverse of the received sample.

130 100 100 90 80 80 70 60 60 50 40 40

CDF (%) BASIC CDF (%) Same AP, BASIC 30 EPICENTER 20 Different AP, BASIC SOFT Same AP, SIC 20 MV MRD Different AP, SIC 10 0 SIC 0 20 40 60 80 100 0 0 20 40 60 80 100 120 Throughput Gain over TDMA (%) Throughput Gain over TDMA (%) (a) The throughput gain distribution of BASIC (b) The throughput gain distribution in a 3×3 and SIC over TDMA in a 2×2 testbed. testbed.

Figure 5.9: The throughput gain over TDMA distribution.

5.6.2 Testbed Results

We first evaluate the throughput performance of BASIC in a 2×2 network with 4 US-

RPs. We place the devices arbitrarily in an office environment and then execute both BA-

SIC and TDMA. A total of 1000 packets with 1000 Bytes each are sent for each setting.

In TDMA, we pick the best data rate for each client based on its maximum SNR to the

APs. We divide the network scenarios into two categories as before, i.e. “Same AP” and

“Different AP” based on the RSSs between the APs and clients. The throughput gain of

BASIC and SIC over TDMA is shown in Figure 5.9(a). When the clients are associated with different APs, the median throughput of BASIC is 1.5× the throughput of TDMA, whereas an average of 1.24× throughput is achieved when the clients are associated with the same AP. This indicates that BASIC leverages diversity in RSS. SIC, on the other hand, shows no throughput gain over TDMA in more than 60% of the tests, reconfirming the results reported in [85].

131 Then we studied the performance of BASIC in the ORBIT testbed. We also implement the following schemes for comparison:

• TDMA: At each data transmission slot, TDMA picks a client in a round-robin man-

ner. The data rate of the chosen client is selected according to the maximum RSS

value across all the APs so that the chosen client can get the best data rate possible.

• SIC: We perform SIC but in a single AP, hence it does not take assistance from the

backbone. We calculate the best achievable aggregated data rate at each AP, then

choose the AP where the aggregated data rate is the highest.

• Majority Voting (MV) [100]: For an incorrect packet, each bit is determined using

maximum voting from the bits decoded by all APs.

• MRD [65]: Each packet is divided into several blocks. The assumption is that each

block is received correctly by at least one AP. Then all combinations of these decoded

blocks are tested to see if the checksum is passed or not.

• SOFT [100]: Instead of collecting the decoded bits from all APs, the confidence of

each bit are combined with their variances as the weight.

• Epicenter [41]: Instead of exchanging received samples between APs, coarse esti-

mation of the samples are collected to reduce the communication overhead. Specifi-

cally, a higher density constellation is used to quantify the received samples.

For each experiment, we randomly picked 3 transmitters and 3 receivers from the 20

USRPs in the ORBIT testbed. Figure 5.9(b) shows the throughput gain of different schemes

over TDMA. There is almost no gain for MV and MRD since the assumption that at least

one AP receives a block correctly is difficult to meet. SIC, as predicted in [85], achieves

132 little gain over TDMA. The throughput gains of SOFT and Epicenter are marginal (1% and

7%, respectively). There are three reasons for that. First, since there are only 3 APs, the

AP with the strongest SNR dominates the combination process, which indicates that the bit

error rate after combination does not differ a lot. Second, the relative throughput gain de-

creases at high data rates due to control overhead [41]. We are not limiting our data rate to

small values in the experiments. Third, we used TDMA as the baseline and picked the best

data rate possible while the baseline scheme is 802.11 in SOFT [100] and Epicenter [41].

The average throughput gain of BASIC over TDMA is 48%, which is not as high as shown

in Figure 5.2(b). The first reason is that we used discrete data rates in the experiment while

Figure 5.2(b) assumes capacity achieving data rates. The second reason is that our cancel-

lation scheme cannot achieve more than 20 dB interference cancellation, which limits our

throughput gain at high SNRs. More complex digital cancellation technique as reported

in [20] can achieve 48 dB interference cancellation. We believe that BASIC can achieve

higher throughput gain with this technique and plan to implement it in the future.

5.7 Trace-Driven Simulation

The number of nodes in the testbed limits the network size to evaluate BASIC. In this section, we turn to trace driven simulation in ns-3 and study the performance of BASIC in both single and multiple collision domains.

5.7.1 Simulation Setup

Besides implementing BASIC in ns-3, we also implemented the following schemes for comparison:

• TDMA: As in Section 5.6.

133 • SIC: As inSection 5.6 except thatwe choose the best of TDMAand SIC to maximize

throughput of SIC.

• Symphony [17]: Symphony takes advantage of the fact that not all APs are able to

hear the transmission from a specific client in multiple collision domains. Symphony

reduces the number of collisions by suppressing a subset of clients in each time slot.

In the end, some APs are able to receive packets without interference. These packets

are forwarded to other APs where their interference can be cancelled. Note that

the clients in Symphony need to keep transmitting over multiple slots, resulting in

wastage of energy.

Since ns-3 does not have sample level simulation capability, we did not implement MV,

MRD, SOFT and Epicenter. Also, as shown in Section 5.6, these schemes perform similar to TDMA. To obtain realistic performance from the simulator, we feed the RSS values from our testbed into the simulator. We also include the residual interference trace from

Section 5.6.1.

5.7.2 Single Collision Domain

In single collision domain, APs and clients are placed in such a way that every AP can hear every client in the network. Performance of TDMA, SIC and BASIC is evaluated in this section.

Throughput Performance

Figure 5.10(a) compares the throughput of different protocols when the number of APs remains the same. It can be seen from the figure that the throughput of all the three schemes remains similar as the number of clients increases. The throughput of BASIC is almost

134 120 300 BASIC 100 250 SIC TDMA 200 80 BASIC SIC 150 60 TDMA 100

40 50 Throughput (Mbps)

Throughput (Mbps) 20 0 20 30 40 50 60 70 80 5 10 15 20 25 30 35 Number of Clients Number of APs (a) Throughput vs. number of Clients (b) Throughput vs. number of APs (50 (10 APs) clients)

1 100 0.8 80

0.6 BASIC 60 SIC 0.4 TDMA 40 CDF (%) Fairness 0.2 20

0 0 0 0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 35 Throughput Improvement of Number of APs BASIC over TDMA (Mbps) (c) Fairness vs. number of APs (50 (d) Throughput improvement per client clients) (5 APs, 50 clients)

Figure 5.10: The performance in single collision domain.

2.5× the throughput of TDMA and approximately 2.3× the throughput of SIC. This gain

can be attributed to the fact that BASIC allows multiple clients to transmit at the same time

in contrast to TDMA. The throughput of SIC is up to 17% higher than TDMA.

Figure 5.10(b) compares the throughput when the number of clients remains the same.

The throughput of BASIC increases almost linearly, however the throughput of SIC and

TDMA increases slightly. As the number of APs increases, the diversity in the network also

135 increases and it helps BASIC to schedule more clients and reach up to 4.8× the throughput

of TDMA.

Jain’s Fairness Index

Figure 5.10(c) compares the fairness of different protocols. Fairness of SIC is very low

because we only try to maximize its throughput. In Figure 5.10(c), fairness of BASIC is

0.09 unit lower than BASIC. We take a closer look at the throughput of each client. The

throughput improvement of each client is plotted in Figure 5.10(d). As can be seen, none of

the client in BASIC has lower throughput than TDMA, which indicates that the throughput

gain of BASIC does not come from the starvation of any client.

5.7.3 Multiple Collision Domains

In multiple collision domains an AP may not be able to hear all the clients in the net-

work. The network size is 1000m×1000m and the communication range is set to 200m.

The performance of BASIC, Symphony and TDMA is evaluated. SIC is not evaluated in this setup because as seen in single collision domain it is performing poorly.

Throughput Performance

Figure 5.11(a) compares the throughput of different schemes when the number of APs remainsthe same. Itcan be seen from the figure that the throughput of all the three protocols remains similar as the number of clients increases. The throughput of BASIC is around

1.5× higher than the throughput of Symphony and 3× better than the throughput of TDMA.

Figure 5.11(c) compares the throughput of different schemes when the number of clients remains the same. The throughput of BASIC increases linearly, however the throughput of Symphony changes very slowly while the throughput of TDMA almost keeps the same.

136 250 1 BASIC Symphony 0.8 200 TDMA

0.6 150 0.4

100 Fairness BASIC 0.2 Symphony 50 TDMA Throughput (Mbps) 0 40 60 80 100 40 60 80 100 Number of Clients Number of Clients (a) Throughput vs. number of Clients (b) Fairness vs. number of Clients (10 (10 APs) APs)

500 1

400 0.8

300 BASIC 0.6 Symphony 200 TDMA 0.4 Fairness BASIC 100 0.2 Symphony TDMA Throughput (Mbps) 0 0 10 20 30 40 50 10 20 30 40 50 Number of APs Number of APs (c) Throughput vs. number of APs (50 (d) Fairness vs. number of APs (50 clients) clients)

Figure 5.11: The performance in multiple collision domains.

As the number of APs increases BASIC exploits the diversity better than other protocols.

This characteristic is also evident in single collision domain. Moreover, throughput im- provement of BASIC is even better than the single collision domain, e.g., for 30 APs and

50 clients the throughput of BASIC in multiple collision domains is 1.5× better than that of single collision domain. This is the result of better SNR diversity in multiple collision domains.

137 Jain’s Fairness Index

Figures 5.11(b) compares the fairness when the number of APs is fixed. The fairness of BASIC is 0.07 unit lower than Symphony at the beginning. But as the number of clients increases the fairness of BASIC also increases. Symphony has better fairness because all the clients get equal chance to transmit in a group but in case of BASIC it may not be true.

Figure 5.11(d) compares the fairness when number of clients is fixed. Fairness of BASIC is 0.09 unit lower than the fairness of Symphony. However, Fairness of BASIC increases with the number of APs. The improved SNR diversity helps to improve the fairness of

BASIC.

5.8 Related Work

Successive Interference Cancellation: Interference cancellation schemes have been used in the cellular networks for a long time [11]. The mobile cellular devices share a common receiver, the base station, which is able to control the transmission power and data rate of the mobile devices. The use of SIC in Zigbee networks has been studied in [44]. It is shown that SIC is an effective method to combat both the hidden and exposed terminal problems. In [85], the authors studied the performance gain of SIC over TDMA assuming an ideal data rate. It is concluded that the improvement from SIC is marginal in most cases because the restrictive data rate and SNR requirement limit the applicable scenarios for

SIC. AutoMAC [42] takes advantage of rateless coding techniques to realize SIC in the uplink transmissions. Since rateless coding allows the clients to achieve throughput close to capacity, it is able to utilize the full power of SIC. Our work differs from the above in that the interfering packet can be decoded by another AP, largely relaxing the SNR requirement to use SIC. There are also multiple schemes in which interference cancellation (IC) is

138 used to subtract known packets instead of decoding and subtracting. In Analog Network

Coding [53], one relay node amplifies and forwards the received collision packets from two nodes. These nodes, upon receiving the collision, are able to remove the interference of their own packet using IC. ZigZag [38] first collects two collisions from the same two packets. Then it takes advantage of the different alignment of these two collisions and decodes part of the packet without interference. This decoding strategy allows the use of

IC to decode both collided packets. In Full-Duplex [27,32,49] communication, IC is used to cancel the known signal from the transmitting chain. Unlike these works, BASIC obtains the known interference from other APs and faces the residual frequency offset problem.

Backbone Assisted Schemes: In the enterprise networks, the underutilized backbone network provides a side channel for the APs to coordinate with each other. In Network

MIMO [7] and CoMP [47], APs exchange received sample to decode the uplink pack- ets jointly. However, these schemes could easily overwhelm the backbone network [39].

MegaMIMO [74] uses the backbone to share the downlink packets from each AP to fa- cilitate distributed MIMO. CENATUR [88] uses conflict map to schedule the downlink transmissions to overcome hidden and exposed terminal problems. These schemes are all specific to downlink traffic and do not work for the uplink traffic. In SOFT [100] and Epi- center [41], the decoded information about the received packet from all APs are collected together to increase the decoding possibility. These schemes work for one transmission at a time instead of concurrent transmissions. TRACK [46] schedules concurrent downlink transmissions with data rate selection which could also be extended to uplink traffic. How- ever, none of them takes advantage of interference cancellation to boost the throughput performance. Another work that explores packet collisions in the uplink is Symphony [17].

The idea is based on the observation that the collided packets at one AP may be received

139 without interference at another AP in a multiple collision domain network. The correctly decoded packets allows the use of interference cancellation to recover the other packet un- der collision. Compared to BASIC, the clients in Symphony need to keep transmitting over multiple slots, resulting in wastage of energy. Also it only improves the performance for multiple collision domains and does not use data rate control to protect packet decoding under interference.

5.9 Discussion and Conclusion

In this chapter, we present BASIC, a novel concept of using the backbone for distribut- edly decoding packets at multiple receivers using interference cancellation which does not require synchronization and has low energy overhead. Our theoretical analysis shows that

BASIC can achieve a median of 2.6× the throughput of TDMA in a 4×4 network. Our experiments with USRPs and trace-driven simulations also characterizes the advantages of

BASIC.

In this section, we discuss some practical issues with BASIC and how we can further improve its performance.

Wideband Transmissions: Traditionally, researchers have believed that SNR is not a good indicator for data rate adaptation algorithms [96]. The main reason is that different subcarriers in the Wi-Fi OFDM system experience different fading and have different SNR levels due to multi-path effects. Since we only use 1 MHz as the bandwidth in our exper- iment, we did not observe large variations in the SNRs on different subcarriers. To make

BASIC work over a larger bandwidth with frequency selective fading, we can introduce the effective SNR [45] concept and use the effective SINR for the data rate selection in- stead. Instead of using the average SNR of the samples to estimate the BER, effective SNR

140 calculates the BER of each subcarrier based on its own SNR and then uses these BERs to

estimate the effective flat SNR of the channel.

Power Control: It is shown that transmission power control improves the performance of SIC [85]. Power control provides another dimension for us to adjust the SNR diversity in the network, which also favors BASIC. We leave this as a future research direction.

Co-existence with Legacy Wi-Fi Devices: Current Wi-Fi systems use carrier sens- ing to resolve channel contention between devices. It also uses network allocation vector

(NAV) as a virtual carrier-sensing mechanism. NAV is a data filed in the MAC header that indicates the transmission duration of the current packet, during which the channel is supposed to be busy. To co-exist with legacy Wi-Fi networks, in BASIC, the Head AP

contends for channel access using carrier sensing and sets the NAV field in the Poll and

Trigger packets to proper values to forbid legacy devices from transmitting. Also if a client detects the channel to be busy while receiving the commanding packet from the AP, it will not transmit anything to avoid interference at legacy nodes.

PHY ACKs: In the current design of BASIC, we did not discuss the ACKs for the uplink transmissions. Since the packet decoding happens on the backbone in a cascaded fashion, it takes longer time than the SIFS duration for instant ACKs as in 802.11a/g. As a future work, we plan to implement the block-ACK scheme used in 802.11n/ac, which allows the APs to send delayed ACKs for several packets together. Meanwhile, the clients keep a window for unacknowledged packets instead of switching to the timeout state.

141 Chapter 6: Conclusions and Future Work

With a rapid increase in the number of smart devices, the traffic burden on the current

Wi-Fi system keeps on ascending. Since wireless spectrum is precious and limited, increas- ing wireless throughput just by augmenting the bandwith is not the most desired approach.

Traditional Wi-Fi systems use carrier sense multiple access (CSMA) as the foundation for medium access and forbid a wireless device to transmit when the channel is busy. Instead, we explored another direction for medium access by encouraging concurrent transmissions in this thesis and proposed several techniques to boost the throughput of wireless commu- nications.

We first presented RCTC, a rapid concurrent transmission coordination scheme for full duplex radios. It utilizes the full potential of full duplex transmissions while inducing little coordination overhead. Then we discussed three schemes that takes advantage of the wired backbone in enterprise networks: DOMINO, BBN, and BASIC. DOMINO introduces the least traffic on the backbone since it only transmits control packets while both BBN and

BASIC also requires the exchange of decoded packets. BBN increases the uplink through- put linearly with the number of wireless clients and is suitable for networks with densely deployed APs. BASIC, on the other hand, does not require a quadratic number of APs.

The schemes discussed in this thesis are just the first step towards exploiting concurrent transmissions. To move forward, the following directions need to be explored:

142 • Data Rate Adaptation in BBN: In this thesis, Auto Rate Fallback (ARF) is used to

estimate the data rate for each client. This scheme works well in a static network. In

a mobile network, channel conditions change over time frequently and the mobility

of a client may also change the group it belongs to. In this case, ARF is not able to

converge to the best data rate before the channel condition changes. A brute force

solution is to calculate the achievable data rate for each client with all the channel

coefficients. This solution introduces heavy overhead given that probing all chan-

nel coefficients is time consuming. To obtain optimal throughput performance, it

also needs to check all the possible combinations, which is exponential in time. A

data rate selection algorithm that can achieve sub-optimal performance in polynomial

time is required for mobile scenarios. A potential solution will be using the current

greedy algorithm to select the AP client pairs first. Then only probe the channel con-

ditions between these APs and clients, which can be used to estimate the achievable

data rate for each client. Since channel independency affects aggregate throughput

and is not known ahead of time, this solution may yield a much lower throughput

compared to the optimal solution. To improve this solution, channel independent

history could be maintained and updated when the channel condition changes dra-

matically.

• Power Control in BASIC: All transmitters in BASIC are assumed to transmit with

the maximum transmit power. By exploiting different power levels, BASIC can fur-

ther increase transmission concurrency. For example, assume in Figure 1.5, s11 = 18

dB, s12 = 13 dB, s21 = 10 dB, and s22 = 14 dB. The SINR for C1 is 8 dB and

SNR for C2 after cancellation is 14 dB. According to Figure 5.6, the data rate for C1

is 12 Mbps and the data rate for C2 is 24 Mbps. If we decrease the transmit power

143 of C2 by 2 dB, the SINR for C1 is now 10 dB and the SNR for C2 is 12 dB. Both

clients can get 24 Mbps data rate in this case and the throughput increases by 33%

compared to transmitting with maximum power. However, a change in the transmit

power of one transmitter affects the interference level at all receivers. The complexity

of an algorithm that tries all transmit power combinations is exponential. It remains

to be explored how well different polynomial power control schemes perform with

BASIC.

• Concurrent Broadcasting with Full Duplex: In vehicle networks, safety messages

are periodically broadcast to the surrounding nodes. Traditional solutions use carrier

sensing and only allow one node to broadcast at a time. Instead, we ask all nodes to

broadcast simultaneously. With full duplex technique, all nodes could transmit and

receive at the same time. Each node repeatedly broadcasts the same message several

times to enable decoding at all receivers. At the receiver side, there are two possible

decoding strategies. First, these repeated signals could be combined together such

that the intended signals are constructively added. However, the channel conditions

of each transmission need to be estimated accurately for combination, in presence

of high interference from other concurrent transmissions. An estimation scheme that

has low overhead and high accuracy remains to be investigated. Second, these re-

peated signals could be combined such that the interference signals are destructively

added. This solution not only needs channel estimation, but also requires all trans-

missions to be aligned, which makes the system more complex. On the other hand,

the second solution could potentially decode all packets with less number of retrans-

missions. It requires further investigation to find out which solution works better in

reality.

144 • Practical Deployment: All four techniques proposed in this thesis are tested with

experiments using USRPs. However, further studies are required to deploy them in

reality with commodity hardware. Modification of the hardware of the clients is dif-

ficult to achieve given that the clients may not want to buy new hardware. On the

other hand, hardware modification on the APs is easier to realize since the number

of APs is much less than the number of mobile devices and the APs are managed by

only one entity (in enterprise scenarios) instead of countless clients. Devices with

full duplex capability, which are not yet on the market, are required by RCTC. One

way to implement RCTC is to enable full duplex for APs while leaving clients un-

touched. Although current APs are not able to do self-interference cancellation, one

can combine two Wi-Fi cards together while deploying the two antennas at different

locations. The interference reduction introduced by distance, though not perfect for

self-interference cancellation, still allows secondary and exposed transmissions to

occur concurrently. For DOMINO, it requires devices to use correlation to detect the

relative triggers. One solution is to add an extra correlation chain to current Wi-Fi

cards. This requires modification in the hardware of both clients and APs, which is

difficult to achieve. Another solution is to change the design of relative scheduling

and re-use the DSSS solution from 802.11b, which may result in increased overhead.

Further exploration is required to use DSSS in DOMINO without hardware modifi-

cations. Both BBN and BASIC require APs to use exchanged packets to do inter-

ference cancellation. This leads to hardware modifications, which can be achieved

by deploying new APs with BBN and BASIC enabled. On the other hand, there is

no need of hardware changes at the clients. Instead, both schemes only need clients

145 to be able to send simultaneously upon receiving a poll packet from the leading AP,

which can be achieved by updating the firmware of the devices.

• MAC Design of 60 GHz Communications: The new Wi-Fi standard 802.11ad uses

60 GHz as the underlying central frequency and supports up to around 7 Gbps trans-

mission data rate. To confront the high path loss of 60 GHz frequency, high di-

rectional beams are introduced, which makes it difficult for devices to detect the

channel usage, resulting in more hidden terminals. Super-Controller which is able

to collect the interference information and coordinate transmissions, similar to the

central controller used in DOMINO, has been shown to be promising [70]. However,

implementing such a Super-Controller is non-trivial. 802.11g/n standards support

both 2.4 GHz and 5 GHz frequency with larger coverage range than 802.11ad, which

can be used as a side channel for interference collection and medium access schedul-

ing. Schemes proposed in this thesis, given that they achieve much higher throughput

than 802.11g, allow more information to be exchanged and are able to support the

design of more powerful central controllers for 60 GHz communications. A hybrid

MAC layer design for 802.11ad remains another promising direction to be explored.

146 Appendix A: RCTC

A.1 Signature Detection

The nature of the signatures allows for their detection even in the presence of significant noise and interference (down to -10 dB in our experiments), without requiring precise frequency and timing correction. This enables rapid detection of the signals and facilitates alignment of the full duplex transmissions. Let the signature signal be x[k], k = 0, 1, 2, ..., N − 1. The corresponding received signal can be represented as

− y[k]= e 2πjk∆f/fs h[k]x[k]+n[k], where ∆f is the frequency offset between the transmitter and receiver, fs is the sampling frequency, h[k] is the channel coefficient, and n[k] is noise.

As the duration over which the signature is transmitted is relatively small, we assume that the channel remains unchanged over that period, i.e., h[k] ≈ H. When the corresponding signature is not received, the cross correlation of x and y, Corr(x, y), is expected to be low as they are independent. But when the signature appears in the received signal, we have:

k=N−1 ∗ Corr(x, y) = Σk=0 x[k] y[k]

k=N−1 ∗ −2πjk∆f/fs = Σk=0 x[k] (e Hx[k]+ n[k])

k=N−1 −2πjk∆f/fs 2 = HΣk=0 e |x[k]|

k=N−1 ∗ + Σk=0 x[k] n[k] (A.1)

147 In Equation (A.1), because x[k] and n[k] are independent,

k=N−1 ∗ lim Σk=0 x[k] n[k]=0 (A.2) N→∞

Observe that

− lim e 2πjk∆f/fs =1,k =0, 1, .., N − 1 (A.3) N∆f/fs→0

With a fairly large N, we could still make N∆f/fs a very small value because ∆f/fs is

close to 0 (no larger than 2.5 × 10−6 for USRP 2). If the length of signatures, N, is selected

properly, both Equation (A.2, A.3) could be around their limitations. Then we have,

k=N−1 2 Corr(x, y) ≈ HΣk=0 |x[k]| (A.4)

To minimize false positives, the cross-correlation between different signatures should

be low compared with the self-correlation. Gold code [37] is a good candidate because the

cross correlation of different pairs of codes is low and bounded. To generate Gold codes,

two m-sequences [40] of the same length, (2L − 1), are selected, such that the absolute

L+2 value of the cross correlation between them is bounded to 2 2 +1, where L is the length

of the m-sequences generator. These two m-sequences are called preferred m-sequences.

Using the XOR of one m-sequence with shifted versions of the other m-sequences, we can

L L+2 get 2 − 1 new Gold codes. The cross correlation for Gold codes is bounded by 2 2 +1

L+1 L for even L and 2 2 +1 for odd L, while the self-correlation value is (2 − 1).

Experimental Setup: To evaluate the performance of our signature design, we use two

USRP platforms with the GNU Radio [4] software. For the Gold codes, a generator length

of 7 is used which results in a total of 129 different codes with a length of 127. It takes

6.35µs in a 20MHz wireless channel and can support up to 129 nodes in one contention

domain. The transmitter continuously sends code 0 every 0.01s. The average SNR at the

148 1 1

0.8 0.8

0.6 0.6 0.4 0.4

Correlation Results 0.2 Correlation results 0.2 0 0 1 2 3 4 5 6 7 8 9 10 4 1 2 3 4 5 6 7 8 9 10 Samples x 10 Samples 4 x 10 (b) Cross correlation between code 0 and code (a) Self correlation of code 0 3

Figure A.1: Self correlation and cross correlation for Gold codes

receiver is 7dB. We use codes 0 and 3 to perform the correlation at the receiver side and normalize the correlation values to 1.

The results are shown in Figure A.1. We can see distinct self-correlation peaks while the values of cross-correlation and correlation with noise are relatively low.

149 Appendix B: BBN

B.1 Algorithm Satisfiable

As discussed before in Section 4.5.2, Algorithm Satisfiable determines if a given sched-

ule is satisfiable or not. Without loss of generality, let S be the schedule such that S =

th {(Ci, APi): APi is the receiving AP for packet xi, and xi is the i packet to be decoded

and 1 ≤ i ≤ N}. Then, we draw an undirected graph G where we have two sets of vertices:

(i) V1: Each vertex in V1 corresponds to a pair (Ci, APi) such that (Ci, APi) ∈ S; and, (ii)

V2: Let A be the set of all APs in the group. Each vertex in V1 corresponds to an AP (say

APj) in A such that APj is not in S.

Next, we draw an edge from vertex Vi ∈ V1 to a vertex Vj ∈V2 if and only if APj can hear from Ci. We set the capacity of all these edges to 1.

Next, we construct a source vertex (say Vs) and draw edges from Vs to every vertex, say

ViinV1. Further, we set the capacity of edge from Vs to Vi as i − 1 where 1 ≤ i ≤ N − 1.

We also set the capacity of edge from Vs to VN as N − 2. Finally, we also construct a

termination (or sink) vertex (say Vt). Then, we add edges from each vertex Vj ∈ V2 to Vt

and set the capacity of these edges to 1.

150 We solve the Maximum Flow problem on G from vertex Vs to Vt. We say that the given

N 2−N−2 schedule S is satisfiable only if the flow value is at least 2 where N is the length of the schedule. Next, we prove the correctness of this reduction.

Theorem B.1.1. If G has the desired max flow, then S is satisfiable.

N 2−N−2 Proof. G can have a flow of at least 2 only if the following two conditions are satis-

fied: (i) From every vertex (Ci, APi), there is an outgoing flow of i−1 when 1 ≤ i ≤ N −1; and, (ii) Vertex (CN , APN ), there is an outgoing flow of N − 2. This implies that for every client Ci, there are atleast i − 1 APs that can hear it when 1 ≤ i ≤ N − 1. This also implies that for client CN , there are at least N −2 APs that can hear it. These two conditions ensure that S must be satisfiable.

Theorem B.1.2. If S is satisfiable, then G has the desired max flow.

Proof. If S is satisfiable, then for every client Ci in S, there must be atleast i − 1 unique

APs that can hear it when 1 ≤ i ≤ N − 1. This also implies that for client CN , there are at

least N − 2 unique APs that can hear it. These two conditions ensure that the max flow in

the graph G should be at least 1+2+ ... +(N − 2) from the clients from C1 to CN−1 and

N 2−N−2 a flow of N − 2 for the CN . Thus, the total flow should be at 2 .

151 Bibliography

[1] IEEE Std. 802.11-2007, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, June 2007.

[2] IEEE Standard for a Precision Clock Synchronization Protocol for Networked Mea- surement and Control Systems. IEEE Std 1588-2008, pages c1–269, 2008.

[3] Network Time Protocol, accessed Jan. 2013. http://www.ntp.org/.

[4] GNU Radio, accessed Jan. 2014. http://gnuradio.org.

[5] Facts tagged with WiFi, accessed Mar. 2013. http://www.factbrowser.com/tags/wifi.

[6] Open-Access Research Testbed for Next-Generation Wireless Networks (ORBIT), accessed Sept. 2015. http://www.orbit-lab.org.

[7] N. Jindal A. Goldsmith, S. Jafar and S. Vishwanath. Capacity limits of MIMO channels. IEEE J. Select. Areas Commun., vol. 21, no. 5, pages 684–702, 2003.

[8] Arup Acharya, Archan Misra, and Sorav Bansal. MACA-P: a MAC for Concurrent Transmissions in Multi-hop Wireless Networks. In Proc. of IEEE PERCOM, pages 505–508, 2003.

[9] Fadel Adib, Swarun Kumar, Omid Aryan, Shyamnath Gollakota, and Dina Katabi. Interference Alignment by Motion. In Proc. of ACM MobiCom 2013.

[10] Nabeel Ahmed, Srinivasan Keshav, and Konstantina Papagiannaki. OmniVoice: a Mobile Voice Solution for Small-Scale Enterprises. In Proc. of ACM MobiHoc, pages 5:1–5:11, 2011.

[11] J. G. Andrews. Interference Cancellation for Cellular Systems: A Contemporary Overview. Wireless Commun., 12(2):19–29, 2005.

[12] V Sreekanth Annapureddy, Aly El Gamal, and Venugopal V Veeravalli. Degrees of Freedom of Interference Channels with CoMP Transmission and Reception. IEEE Transactions on Information Theory, 58(9):5740–5760, 2012.

152 [13] Eyjolfur Ingi Asgeirsson and Pradipta Mitra. On a Game Theoretic Approach to Capacity Maximization in Wireless Networks. CoRR, 2010.

[14] Industry Association and Electronic Industries Association. CDMA2000 High Rate Packet Data Specification. TIA document. 2003.

[15] H.V. Balan, R. Rogalin, A. Michaloliakos, K. Psounis, and G. Caire. AirSync: En- abling Distributed Multiuser MIMO With Full . IEEE/ACM Transactions on Networking, 21(6):1681–1695, Dec 2013.

[16] Tarun Bansal and et al. RobinHood: Sharing the Happiness in a Wireless Jungle. In Proc. of ACM HotMobile 2014.

[17] Tarun Bansal and et al. Symphony: Cooperative Packet Recovery over the Wired Backbone in Enterprise WLANs. In ACM MobiCom 2013.

[18] Y. Bejerano and R.S. Bhatia. MiFi: a Framework for Fairness and QoS Assurance for Current IEEE 802.11 Networks with Multiple Access Points. IEEE/ACM Trans- actions on Networking, 14:849 –862, Aug. 2006.

[19] Yigal Bejerano, Seung-Jae Han, and Li (Erran) Li. Fairness and Load Balancing in Wireless LANs using Association Control. In Proc. of ACM MOBICOM, pages 315–329, 2004.

[20] Dinesh Bharadia, Emily McMilin, and Sachin Katti. Full Duplex Radios. In Proc. of ACM SIGCOMM, 2013.

[21] Vaduvur Bharghavan, Alan Demers, Scott Shenker, and Lixia Zhang. MACAW: a Media Access Protocol for Wireless LAN’s. In Proc. of ACM SIGCOMM, pages 212–225, 1994.

[22] Bastian Bloessl, Michele Segata, Christoph Sommer, and Falko Dressler. An IEEE 802.11a/g/p OFDM Receiver for GNU Radio. In Proc. of ACM SRIF, pages 9–16, 2013.

[23] Ioannis Broustis, Konstantina Papagiannaki, Srikanth V. Krishnamurthy, Michalis Faloutsos, and Vivek Mhatre. MDG: Measurement-Driven Guidelines for 802.11 WLAN Design. In Proc. of ACM MOBICOM, pages 254–265, 2007.

[24] Viveck R Cadambe and Syed A Jafar. Interference Alignment and the Degrees of Freedom for the K User Interference Channel. IEEE Transactions on Information Theory, 2007.

153 [25] Viveck R Cadambe and Syed A Jafar. Interference Alignment and the Degrees of Freedom for the K User Interference Channel. IEEE Transactions on Information Theory, 2008.

[26] M. Cesana, D. Maniezzo, P. Bergamo, and M. Gerla. Interference Aware (IA) MAC: an Enhancement to IEEE 802.11b DCF. In Proc. of IEEE VTC, volume 5, pages 2799 – 2803, Oct. 2003.

[27] Jung Il Choi, Mayank Jain, Kannan Srinivasan, Phil Levis, and Sachin Katti. Achiev- ing Single Channel, Full Duplex Wireless Communication. In Proc. of ACM MOBI- COM, pages 1–12, 2010.

[28] Asaf Cidon, Kanthi Nagaraj, Sachin Katti, and Pramod Viswanath. Flashback: De- coupled Lightweight Wireless Control. In Proc. of ACM SIGCOMM, pages 223– 234, 2012.

[29] Constantine Coutras, Sanjay Gupta, and Ness B. Shroff. Scheduling of Real-Time Traffic in IEEE 802.11 Wireless LANs. Wirel. Netw., 6:457–466, Dec. 2000.

[30] Emily McMilin Dinesh Bharadia and Sachin Katti. Full Duplex Radios. In Proc. of ACM SIGCOMM, 2013.

[31] Michael Dinitz. Distributed Algorithms for Approximating Ca- pacity. In Proc. of INFOCOM, pages 1397–1405, 2010.

[32] M. Duarte and A. Sabharwal. Full-Duplex Wireless Communications Using Off- The-Shelf Radios: Feasibility and First Results. In the Forty Fourth Asilomar Con- ference on Signals, Systems and Computers (ASILOMAR), pages 1558 –1562, Nov. 2010.

[33] Shane B. Eisenman and Andrew T. Campbell. E-CSMA: Supporting Enhanced CSMA Performance in Experimental Sensor Networks using Per-neighbor Trans- mission Probability Thresholds. In Proc. of IEEE INFOCOM, page 12081216, 2007.

[34] Jeremy Elson, Lewis Girod, and Deborah Estrin. Fine-Grained Network Time Syn- chronization using Reference Broadcasts. SIGOPS Oper. Syst. Rev., 36:147–163, Dec. 2002.

[35] Ruijun Fu, Yunxing Ye, Ning Yang, and Kaveh Pahlavan. Doppler Spread Analysis of Human Motions for Body Area Network Applications. In IEEE PIMRC, pages 2209–2213, 2011.

[36] D. Gesbert, M. Kountouris, R.W. Heath, Chan-Byoung Chae, and T. Salzer. Shifting the MIMO Paradigm. Signal Processing Magazine, IEEE, 24(5):36–46, Sept 2007.

154 [37] R. Gold. Optimal binary sequences for spread spectrum multiplexing. IEEE Trans- actions on Information Theory, 13:619 –621, Oct. 1967. [38] S. Gollakota and D. Katabi. Zigzag Decoding: Combating Hidden Terminals in Wireless Networks. In Proc. of ACM SIGCOMM, 2008. [39] Shyamnath Gollakota, Samuel David Perli, and Dina Katabi. Interference Alignment and Cancellation. In Proc. of ACM SIGCOMM 2009. [40] Solomon W. Golomb. Shift Register Sequences. Aegean Park Press, 1981. [41] M. Gowda, S. Sen, R. Roy Choudhury, and Lee S. Cooperative Packet Recovery in Enterprise WLANs. In IEEE INFOCOM 2013. [42] Aditya Gudipati, Stephanie Pereira, and Sachin Katti. AutoMAC: Rateless Wireless Concurrent Medium Access. In Proc. of ACM MobiCom, pages 5–16, 2012. [43] Zygmunt J. Haas, Senior Member, Jing Deng, and Student Member. Dual busy tone multiple access (DBTMA) - A Multiple Access Control Scheme for Ad Hoc Networks. In IEEE Transactions on Communications, pages 975–985, 2002. [44] Daniel Halperin, Thomas Anderson, and David Wetherall. Taking the Sting out of Carrier Sense: Interference Cancellation for Wireless LANs. In Proc. of ACM MobiCom, pages 339–350, 2008. [45] Daniel Halperin, Wenjun Hu, Anmol Sheth, and David Wetherall. Predictable 802.11 Packet Delivery from Wireless Channel Measurements. In Proc. of ACM SIGCOMM, pages 159–170, 2010. [46] Jun Huang, Guoliang Xing, and Gang Zhou. Unleashing Exposed Terminals in Enterprise WLANs: A Rate Adaptation Approach. In Proc. of IEEE INFOCOM, pages 2481–2489, 2014. [47] R. Irmer, H. Droste, P. Marsch, M. Grieger, G. Fettweis, S. Brueck, H.-P. Mayer, L. Thiele, and V. Jungnickel. Coordinated multipoint: Concepts, performance, and field trial results. Communications Magazine, IEEE, 49, 2011. [48] Syed A. Jafar. Interference Alignment: A New Look at Signal Dimensions in a Communication Network. Now Publishers, 2011. [49] Mayank Jain, Jung Il Choi, Taemin Kim, Dinesh Bharadia, Siddharth Seth, Kannan Srinivasan, Philip Levis, Sachin Katti, and Prasun Sinha. Practical, Real-time, Full Duplex Wireless. In Proc. of ACM MobiCom, pages 301–312, 2011. [50] Raj Jain, Dah-Ming Chiu, and W. Hawe. A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems. Technical Report, Digital Equipment Corporation, DEC-TR-301, 1984.

155 [51] Phil Karn. MACA: A New Channel Access Method for Packet Radio. In Proc. of the 9th ARRL Computer Networking Conference, volume 9th, pages 134–140, 1990.

[52] Anand Kashyap, Samrat Ganguly, and Samir R. Das. A measurement-based ap- proach to modeling link capacity in 802.11-based wireless networks. In Proc. of ACM MOBICOM, pages 242–253, 2007.

[53] Sachin Katti, Shyamnath Gollakota, and Dina Katabi. Embracing Wireless Inter- ference: Analog Network Coding. In Proc. of ACM SIGCOMM, pages 397–408, 2007.

[54] Jae-Hoon Ko, Soonmok Kwon, and Cheeha Kim. Orthogonal Signaling-Based Queue Status Investigation Method in IEEE 802.11. Comput. Commun., 34:1033– 1041, Jun. 2011.

[55] Marc Kuhn, Stefan Berger, I Hammerstrom, and Armin Wittneben. Power Line Enhanced Cooperative Wireless Communications. IEEE JSAC, 24(7):1401–1410, 2006.

[56] Swarun Kumar, Diego Cifuentes, Shyamnath Gollakota, and Dina Katabi. Bringing Cross-Layer MIMO to Today’s Wireless LANs. In Proc. of ACM SIGCOMM 2013.

[57] Rafael Laufer, Theodoros Salonidis, Henrik Lundgren, and Pascal Le Guyadec. XPRESS: a Cross-Layer Backpressure Architecture for Wireless Multi-hop Net- works. In Proc. of ACM MobiCom, pages 49–60, 2011.

[58] K.K. Leung and B.J. Kim. Frequency Assignment for IEEE 802.11 Wireless Net- works. In Proc. of IEEE VTC, volume 3, pages 1422 –1426, Oct. 2003.

[59] T. Li and et al. CRMA: Collision-Resistant Multiple Access. In Proc. of ACM MobiCom 2011.

[60] Kate Ching-Ju Lin, Shyamnath Gollakota, and Dina Katabi. Random Access Het- erogeneous MIMO Networks. In Proc. of ACM SIGCOMM, pages 146–157, 2011.

[61] E. Magistretti, O. Gurewitz, and E.W. Knightly. 802.11 ec: Collision Avoidance Without Control Messages. In Proc. of ACM MobiCom, 2012.

[62] Eugenio Magistretti, Omer Gurewitz, and Edward W. Knightly. 802.11ec: Collision Avoidance without Control Messages. In Proc. of ACM MOBICOM, pages 65–76, 2012.

[63] V.P. Mhatre, K. Papagiannaki, and F. Baccelli. Interference mitigation through power control in high density 802.11 WLANs. In Proc. of IEEE INFOCOM, pages 535 – 543, May 2007.

156 [64] Kimaya Mittal and Elizabeth M. Belding. RTSS/CTSS: Mitigation of Exposed Ter- minals in Static 802.11-Based Mesh Networks. In Proc. of IEEE WiMesh Workshop, Sept. 2006.

[65] Allen Miu, Hari Balakrishnan, and Can Emre Koksal. Improving Loss Resilience with Multi-radio Diversity in Wireless Networks. In Proc. of ACM MobiCom, 2005.

[66] Rohan Murty, Jitendra Padhye, Ranveer Chandra, Alec Wolman, and Brian Zill. Designing High Performance Enterprise Wi-Fi Networks. In Proc. of USENIX NSDI 2008.

[67] Rohan Murty, Jitendra Padhye, Ranveer Chandra, Alec Wolman, and Brian Zill. Designing High Performance Enterprise Wi-Fi Networks. In Proc of USENIX NSDI, pages 73–88, 2008.

[68] Rohan Murty, Jitendra Padhye, Alec Wolman, and Matt Welsh. Dyson: an Architec- ture for Extensible Wireless LANs. In Proc. of USENIX ATC, 2010.

[69] Bobak Nazer and et al. Ergodic Interference Alignment. In Proc. of IEEE ISIT 2009.

[70] Minyoung Park, P. Gopalakrishnan, and R. Roberts. Interference mitigation tech- niques in 60 GHz wireless networks. Communications Magazine, IEEE, 47(12), 2009.

[71] B.W. Parkinson and J.J. Spilker. The Global Positioning System: Theory and Appli- cations. Number v. 1 in Progress in Astronautics and Aeronautics. 1996.

[72] G. Pei and T.R. Henderson. Validation of OFDM Error Rate Model in ns-3. Boeing Research Technology, pages 1–15, 2010.

[73] G. Pei and V. S. A. Kumar. Distributed Link Scheduling under the Physical Interfer- ence Model. In Proc. of INFOCOM, Mar. 2012.

[74] H Rahul, SS Kumar, and D Katabi. MegaMIMO: Scaling Wireless Capacity with User Demand. In Proc. of ACM SIGCOMM 2012.

[75] Hariharan Rahul, Haitham Hassanieh, and Dina Katabi. SourceSync: A Distributed Wireless Architecture for Exploiting Sender Diversity. In Proc. of ACM SIGCOMM 2010.

[76] Hariharan Rahul, Haitham Hassanieh, and Dina Katabi. SourceSync: a Distributed Wireless Architecture for Exploiting Sender Diversity. In Proc. of ACM SIGCOMM, pages 171–182, 2010.

[77] S. Ramanathan. A Unified Framework and Algorithm for Channel Assignment in Wireless Networks. Wirel. Netw., 5:81–94, Mar. 1999.

157 [78] Boris Rankov and Armin Wittneben. Spectral Efficient Protocols for Half-Duplex Fading Relay Channels. IEEE Journal on Selected Areas in Communications, 25(2):379–389, 2007.

[79] Charles Reis, Ratul Mahajan, Maya Rodrig, David Wetherall, and John Zahorjan. Measurement-Based Models of Delivery and Interference in Static Wireless Net- works. In Proc. of ACM SIGCOMM, pages 51–62, 2006.

[80] Jiho Ryu, Changhee Joo, Ted Taekyoung Kwon, Ness B. Shroff, and Yanghee Choi. Distributed SINR Based Scheduling Algorithm for Multi-Hop Wireless Networks. In Proc. of MSWIM, pages 376–380, 2010.

[81] D. Saha, A. Dutta, D. Grunwald, and D. Sicker. PHY Aided MAC - a New Paradigm. In Proc. of IEEE INFOCOM, pages 2986–2990, 2009.

[82] Achaleshwar Sahai, Gaurav Patel, and Ashutosh Sabharwal. Pushing the Limits of Full-duplex: Design and Real-time Implementation. Technical Report TREE1104, Rice University, 2011.

[83] Souvik Sen, Romit Roy Choudhury, and Srihari Nelakuditi. CSMA/CN: Carrier Sense Multiple Access with Collision Notification. In Proc. of ACM MOBICOM, pages 25–36, 2010.

[84] Souvik Sen, Romit Roy Choudhury, and Srihari Nelakuditi. No Time to Countdown: Migrating Backoff to the Frequency Domain. In Proc. of ACM MobiCom, pages 241–252, 2011.

[85] Souvik Sen, Naveen Santhapuri, Romit Roy Choudhury, and Srihari Nelakuditi. Suc- cessive Interference Cancellation: A Back-of-the-Envelope Perspective. In Proc. of ACM Hotnets, 2010.

[86] Mo Sha, Guoliang Xing, Gang Zhou, Shucheng Liu, and Xiaorui Wang. C-MAC: Model-Driven Concurrent Medium Access Control for Wireless Sensor Networks. In Proc. of IEEE INFOCOM, pages 1845–1853, 2009.

[87] Mo Sha, Guoliang Xing, Gang Zhou, Shucheng Liu, and Xiaorui Wang. C-MAC: Model-Driven Concurrent Medium Access Control for Wireless Sensor Networks. In Proc. of IEEE INFOCOM, pages 1845–1853, 2009.

[88] Vivek Shrivastava, Nabeel Ahmed, Shravan Rayanchu, Suman Banerjee, Srinivasan Keshav, Konstantina Papagiannaki, and Arunesh Mishra. CENTAUR: Realizing the Full Potential of Centralized Wlans through a Hybrid Data Path. In Proc. of ACM MOBICOM, pages 297–308, 2009.

158 [89] Nikhil Singh, Dinan Gunawardena, Alexandre Proutiere, Bozidar Radunovic, Ho- ria Vlad Balan, and Peter B. Key. Efficient and Fair MAC for Wireless Networks with Self-interference Cancellation. In WiOpt, pages 94–101, 2011.

[90] E. Sourour, H. El-Ghoroury, and D. McNeill. Frequency Offset Estimation and Correction in the IEEE 802.11a WLAN. In Proc. of IEEE VTC, 2004.

[91] Stanford Information Networking Group (SING). SING Datasets. http://sing.stanford.edu/srikank/datasets.html.

[92] Changho Suh, Minnie Ho, and David NC Tse. Downlink Interference Alignment. IEEE Transactions on Communications, 59(9):2616–2626, 2011.

[93] F. Tobagi and L. Kleinrock. in Radio Channels: Part II–The Hid- den Terminal Problem in Carrier Sense Multiple-Access and the Busy-Tone Solu- tion. IEEE Transactions on Communications, 23:1417 – 1433, Dec. 1975.

[94] S. Verdu. Multiuser Detection. Cambridge University Press, 1998.

[95] M. Vutukuru, H. Balakrishnan, and K. Jamieson. Cross-Layer Wireless Bit Rate Adaptation. In Proc. of ACM SIGCOMM, 2009.

[96] Mythili Vutukuru, Hari Balakrishnan, and Kyle Jamieson. Cross-layer Wireless Bit Rate Adaptation. In Proc. of ACM SIGCOMM, SIGCOMM ’09, pages 3–14, 2009.

[97] Mythili Vutukuru, Kyle Jamieson, and Hari Balakrishnan. Harnessing Exposed Ter- minals in Wireless Networks. In Proc. of USENIX NSDI, pages 59–72, Berkeley, CA, USA, 2008.

[98] K. Whitehouse, A. Woo, F. Jiang, J. Polastre, and D. Culler. Exploiting the Capture Effect for Collision Detection and Recovery. In EEE EmNets Workshop, pages 45– 52, 2005.

[99] Grace R Woo, Pouya Kheradpour, Dawei Shen, and Dina Katabi. Beyond the Bits: Cooperative Packet Recovery Using Physical Layer Information. In Proc. of ACM MobiCom 2007.

[100] Grace R. Woo, Pouya Kheradpour, Dawei Shen, and Dina Katabi. Beyond the Bits: Cooperative Packet Recovery Using Physical Layer Information. In Proc. of ACM MobiCom, pages 147–158, 2007.

[101] Kaishun Wu, Haoyu Tan, Yunhuai Liu, Jin Zhang, Qian Zhang, and Lionel Ni. Side Channel: Bits over Interference. In Proc. of ACM MOBICOM, pages 13–24, 2010.

159 [102] Xinzhou Wu, S. Tavildar, S. Shakkottai, T. Richardson, Junyi Li, R. Laroia, and A. Jovicic. FlashLinQ: A Synchronous Distributed Scheduler for Peer-to-Peer Ad Hoc Networks. In 2010 48th Annual Allerton Conference on Communication, Con- trol, and Computing (Allerton), pages 514–521, 2010.

[103] Xiufeng Xie, Xinyu Zhang, and Karthikeyan Sundaresan. Adaptive Feedback Com- pression for MIMO Networks. In Proc. of ACM MobiCom 2013.

[104] Vivek Yenamandra and Kannan Srinivasan. Vidyut: Exploiting Power Line Infras- tructure for Enterprise Wireless Networks. In Proc. of ACM SIGCOMM, pages 595– 606, 2014.

[105] Xinyu Zhang and Kang G. Shin. E-MiLi: Energy-Minimizing Idle Listening in Wireless Networks. In Proc. of ACM Mobicom, pages 205–216, 2011.

[106] S. Zhou and Z. Zhang. cMAC: A Centralized MAC Protocol for High Speed Wire- less LANs. In Proc. of IEEE GLOBECOM, pages 1–6, 2011.

[107] Wenjie Zhou, Tarun Bansal, Prasun Sinha, and Kannan Srinivasan. BBN: Through- put Scaling in Dense Enterprise WLANs with Bind Beamforming and Nulling. In Proc. of ACM MobiCom, pages 165–176, 2014.

160