Building Blocks for Tomorrow’s Mobile App Store

by

Justin G. Manweiler

Department of Duke University

Date:

Approved:

Romit Roy Choudhury, Supervisor

Jeffrey S. Chase

Landon P. Cox

Victor Bahl

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2012 Abstract (0984) Building Blocks for Tomorrow’s Mobile App Store

by

Justin G. Manweiler

Department of Computer Science Duke University

Date:

Approved:

Romit Roy Choudhury, Supervisor

Jeffrey S. Chase

Landon P. Cox

Victor Bahl

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2012 Copyright c 2012 by Justin G. Manweiler All rights reserved Abstract

In our homes and in the enterprise, in our leisure and in our professions, mobile computing is no longer merely “exciting;” it is becoming an essential, ubiquitous tool of the modern world. New and innovative mobile applications continue to inform, entertain, and surprise users. But, to make the daily use of mobile technologies more gratifying and worthwhile, we must move forward with new levels of sophistication. The Mobile App Stores of the future must be built on stronger foundations. This dissertation considers a broad view of the challenges and intuitions behind a diverse selection of such new primitives. Some of these primitives will mitigate exist- ing and fundamental challenges of mobile computing, especially relating to wireless communication. Others will take an application-driven approach, being designed to serve a novel purpose, and be adapted to the unique and varied challenges from their disparate domains. However, all are related through a unifying goal, to provide a seamless, enjoyable, and productive mobile experience. This dissertation takes view that by bringing together nontrivial enhancements across a selection of disparate-but- interrelated domains, the impact is synergistically stronger than the sum of each in isolation. Through their collective impact, these new “building blocks” can help lay a foundation to upgrade mobile technology beyond the expectations of early-adopters, and into seamless integration with all of our lives.

iv For Jane.

v Contents

Abstract iv

List of Tables xv

List of Figures xvi

List of Abbreviations and Symbols xxviii

Acknowledgements xxxi

1 Introduction 1

2 TransmissionReorderinginWirelessNetworks 14

2.1 Introduction...... 14

2.2 VerifyingMIM ...... 18

2.3 MIM:OptimalityAnalysis ...... 20

2.3.1 OptimalSchedulewithIntegerProgram ...... 20

2.3.2 Results...... 24

2.4 Shuffle:SystemDesign ...... 24

2.4.1 ProtocolDesign...... 25

2.4.2 DesignDetails...... 33

2.4.3 RateControl ...... 33

2.4.4 UploadTraffic...... 34

2.4.5 ControllerPlacement ...... 35

2.5 Shuffle:Implementation ...... 35

vi 2.5.1 TestbedPlatform ...... 35

2.5.2 TimeSynchronizationandStagger ...... 36

2.5.3 CoordinationandDispatching ...... 37

2.6 Evaluation...... 39

2.6.1 Throughputwith2AccessPoints ...... 40

2.6.2 Throughputwith3AccessPoints ...... 41

2.6.3 Fairness ...... 41

2.6.4 PerformanceonLargerTopologies...... 42

2.6.5 CompleteShufflewithRateControl...... 42

2.6.6 SimulationResults ...... 46

2.6.7 ImpactofAPdensity...... 46

2.6.8 ImpactofFading ...... 47

2.7 LimitationsandDiscussion...... 48

2.7.1 ExternalNetworkInterference ...... 48

2.7.2 Latency ...... 48

2.7.3 ClientMobility ...... 49

2.7.4 TransportLayerInteractions...... 49

2.7.5 Compatibility ...... 49

2.7.6 Small-scaleTestbed...... 49

2.8 RelatedWork ...... 50

2.8.1 CaptureandMIM ...... 50

2.8.2 SpatialReuse ...... 50

2.8.3 EnterpriseWirelessLANsandScheduling ...... 50

2.8.4 Characterizing and Measuring Interference ...... 51

2.9 Conclusion...... 52

vii 3 MonitoringtheHealthofHomeWirelessNetworks 53

3.1 Introduction...... 54

3.2 RxIPArchitecture ...... 60

3.3 HiddenTerminalDiagnosis...... 62

3.3.1 EnsuringHiddenTerminalsaretheCause ...... 62

3.3.2 IsolatingtheHiddenTerminal ...... 63

3.4 RecoverybyCoordination ...... 66

3.4.1 CopingwithInternetLatencies ...... 66

3.4.2 MultiplePartnerships...... 67

3.4.3 ProvablePropertiesofCoordination...... 68

3.5 AdditionalConsiderations ...... 70

3.5.1 CopingwithTokenLoss ...... 70

3.5.2 AddressTranslation ...... 71

3.5.3 UploadTraffic...... 71

3.5.4 IncrementalDeployability ...... 71

3.6 Evaluation...... 72

3.6.1 TestbedPlatform ...... 72

3.6.2 Methodology ...... 73

3.6.3 HiddenTerminalDiagnosisandRecovery ...... 73

3.6.4 Microbenchmarks ...... 77

3.6.5 Scalability of Partnership-based TDMA...... 79

3.7 RelatedWork ...... 82

3.7.1 EnterpriseNetworkManagement ...... 82

3.7.2 HiddenTerminalMitigation ...... 82

3.7.3 NetworkMeasurement ...... 82

viii 3.7.4 RelatedTechniques ...... 83

3.8 Conclusion...... 83

4 WiFiEnergyManagementviaTrafficIsolation 84

4.1 Introduction...... 85

4.2 BackgroundandMeasurements ...... 89

4.2.1 ChoiceofDevice ...... 89

4.2.2 MeasurementSet-up ...... 89

4.2.3 Terminology...... 90

4.2.4 PSMEnergyProfiling ...... 91

4.2.5 ImpactofNetworkContentiononEnergy...... 93

4.3 SleepWellDesign ...... 96

4.3.1 BasicSleepWell ...... 96

4.3.2 CopingwithTrafficDynamics ...... 101

4.3.3 SeamlessBeaconRe-adjustment ...... 103

4.3.4 MultipleClientsperAP ...... 104

4.3.5 Compatibility with Adaptive-PSM Clients ...... 105

4.4 Evaluation...... 105

4.4.1 Implementation ...... 105

4.4.2 Methodology ...... 106

4.4.3 PerformanceResults ...... 106

4.5 LimitationsandDiscussion...... 117

4.5.1 ImpactofHiddenTerminals ...... 117

4.5.2 IncrementalDeployability ...... 119

4.5.3 InteractiveTraffic...... 120

4.5.4 TSFAdjustment ...... 120

ix 4.6 RelatedWork ...... 120

4.6.1 WiFiPSMsleepoptimization ...... 120

4.6.2 WiFiDutyCycling ...... 121

4.6.3 SensornetworkTDMA ...... 121

4.7 Conclusion...... 122

5 A Matchmaking System for Multiplayer Mobile Games 123

5.1 Introduction...... 124

5.2 MotivationandPriorWork ...... 126

5.2.1 LatencyinMultiplayerGames ...... 126

5.2.2 MatchmakinginOnlineGames ...... 127

5.2.3 P2PGamesOverCellularNetworks...... 129

5.2.4 CellularNetworkPerformance ...... 130

5.2.5 Grouping ...... 131

5.3 EstimatingCellularLatency ...... 132

5.3.1 Predicting Future Latency Based on Current Latency . . . . . 134

5.3.2 Using One Phone to Predict the Future Latency of a Different Phone ...... 139

5.3.3 PredictingtheLatencyBetweenPhones ...... 145

5.4 Switchboard...... 147

5.4.1 ArchitectureofSwitchboard ...... 147

5.4.2 ClientAPIforLobbyBrowser ...... 148

5.4.3 LatencyEstimator ...... 150

5.4.4 GroupingAgent...... 152

5.4.5 GroupingAlgorithm ...... 153

5.5 Evaluation...... 155

5.5.1 Implementation ...... 155

x 5.5.2 EvaluationofGrouping...... 156

5.5.3 End-to-endEvaluation ...... 161

5.5.4 SummaryofEvaluationResults ...... 165

5.6 Conclusion...... 166

6 An Object Positioning System using 168

6.1 Introduction...... 169

6.2 MotivationandOverview...... 172

6.2.1 ApplicationsbeyondTagging ...... 172

6.2.2 SystemOverview ...... 173

6.3 PrimitivesforObjectLocalization ...... 175

6.4 OPS:SystemDesign ...... 181

6.4.1 ExtractingaVisualModel ...... 181

6.4.2 Questions ...... 185

6.4.3 Point Cloud toLocation: Failed Attempts ...... 186

6.4.4 TheConvergedDesignofOPS...... 188

6.5 Discussion ...... 192

6.5.1 ExtendingtheLocationModelto3D ...... 192

6.5.2 AlternativesforCapturingUserIntent ...... 193

6.6 Evaluation...... 194

6.6.1 Implementation ...... 194

6.6.2 AccuracyofObjectLocalization...... 196

6.7 RoomforImprovement...... 200

6.7.1 Live Feedback to Improve Photograph Quality ...... 201

6.7.2 Improving GPS Precision with Dead Reckoning ...... 201

6.7.3 Continual Estimation of Relative Positions with Video . . . . 201

xi 6.8 RelatedWork ...... 202

6.8.1 Localization through Large-Scale Visual Clustering ...... 202

6.8.2 Aligning Structure from Motion to the Real World ...... 202

6.8.3 BuildingObjectInventories ...... 203

6.8.4 ApplyingComputerVisiontoInferContext ...... 203

6.9 Conclusion...... 203

6.10 ReferenceEquations ...... 204

6.10.1 EquationofVisualTrilateration ...... 204

6.10.2 EquationofVisualTriangulation ...... 205

7 PredictingClientDwellTimeinWiFiHotspots 209

7.1 Introduction...... 209

7.2 NaturalQuestions...... 212

7.3 ToGoPredictionEngine ...... 214

7.3.1 DesignOverview ...... 214

7.3.2 Components...... 215

7.4 BytesToGo: AnApplicationofToGo ...... 218

7.4.1 Motivation...... 218

7.4.2 NaturalQuestions ...... 219

7.4.3 ExtendingfromToGo ...... 221

7.5 ImplementationandEvaluation ...... 223

7.5.1 PrototypeImplementation ...... 223

7.5.2 ToGo Performance: Dwell Prediction Accuracy ...... 224

7.5.3 BytesToGo Performance: Offloading 3G to WiFi ...... 230

7.6 Discussion ...... 233

7.6.1 ConsiderationsforallToGoSystems ...... 234

xii 7.6.2 Considerations particular to BytesToGo ...... 235

7.7 RelatedWork ...... 236

7.7.1 WiFiandCellular...... 236

7.7.2 MobilityPrediction ...... 236

7.7.3 ActivityRecognition ...... 237

7.8 Conclusion...... 237

8 Encounter-Based Trust for Mobile Social Services 238

8.1 Introduction...... 239

8.2 TrustandThreatModel ...... 241

8.2.1 AdversarialCapabilities ...... 242

8.2.2 AdversarialLimitations...... 243

8.3 SMILESystemDesign ...... 243

8.3.1 EncounterDetection ...... 245

8.3.2 Missed-Connection Reestablishment ...... 247

8.3.3 K-anonymityPreservation ...... 249

8.3.4 ImplementationConsiderations ...... 257

8.4 DecentralizedArchitecture ...... 260

8.4.1 DistributedOperation ...... 261

8.4.2 IdentifierSetSelection ...... 262

8.5 Evaluation...... 265

8.5.1 KeyAdvertisementDetection ...... 265

8.5.2 CraigslistClassification...... 267

8.6 RelatedWork ...... 268

8.6.1 LocationProofs...... 268

8.6.2 LocationPrivacy ...... 269

xiii 8.6.3 AnonymousMessaging ...... 270

8.7 Conclusion...... 270

9 Conclusion 271

Bibliography 273

Biography 292

xiv List of Tables

2.1 Integer programming parameters and variables ...... 22

5.1 Network parameters observed at 6 different locations in each of the three experiments – (a) Seattle, (b) Redmond, (c) Durham. In all cases, the phones were connected to AT&T Wireless with MNC 410 and MCC 310, over HSDPA with 3-4 bars of signal strength. For the Seattle experiment, one phone was left at “S-home” while the other visited each of the 6 locations. For Redmond, the stationary phone was at “M-home”, and for Durham it was “R-home”...... 142

5.2 Experimental parameters for end-to-end experiments...... 161

6.1 OptimizationforTriangulation ...... 206

6.2 OPSOptimizationonGPSError ...... 207

6.3 OPSFinalObjectLocalization ...... 208

8.1 AnalyticalModelParameters ...... 250

xv List of Figures

1.1 Building blocks by design approach. In the first half of the disserta- tion, we take a bottom-up perspective, seeking to optimize existing wireless communication for enhanced performance, reliability, and en- ergy efficiency. In the second half, we take an application-driven top- down approach, considering possible future mobile applications and the supporting primitives required to enable them...... 4

2.1 AP1ÑR1 must start before AP2ÑR2 to ensure concurrency. If AP2 starts first, R1 locks onto AP2 and cannot re-lock onto AP1 later... 17

2.2 Testbed confirms MIM capability. Rx receives from Tx (at 5 positions) inthepresenceofinterference(Intf)...... 19

2.3 MIM can provide large concurrency gains. These graphs show the number of links that can meet SINR requirements with and without MIM enabled. Gains improve with increasing (a) number of clients and(b)numberofAPs...... 24

2.4 Flow of operations in the Shuffle system. Data packets arrive from the network gateway and are enqueued at an AP. The AP notifies the controller of the waiting outbound packet. The controller inserts the corresponding AP-client pair into a network-wide link queue, and eventually schedules this link as part of a concurrent batch. The AP dequeues and transmits the packet according to the controller’s prescribed schedule, and subsequently notifies the controller of all fail- ures. The controller utilizes this feedback for loss recovery and conflict diagnosis...... 26

2.5 Per-link data structure maintained at the controller for scheduling transmissions. AP2 is the transmitter for link li...... 28 2.6 HeuristicsforMIM-awarescheduling...... 32

2.7 Illustration of a scheduled batch of packets with the staggered trans- mission times. AP1 starts first, followed by AP3, then AP2...... 33

xvi 2.8 AP-to-controller clock synchronization error and transmission de- viance from the assigned schedule, relative to the local clock. AP and controller were separated by approximately 20 m of CAT-5 cable, 1 switch, and 1 hub. Margin of error ď 5 s, attributable to 802.11 TSFinaccuracy...... 38

2.9 Concurrencygainswithonlytwolinks...... 40

2.10 Multiple Shuffle orders provide higher throughput than both TDMA and802.11...... 41

2.11 Shuffleschedulingimprovesfairness...... 42

2.12 (a) Example 10 link topology in our building (b) Throughput and (c) fairnessonentireShuffletestbed...... 43

2.13 Throughput for Shuffle versus TDMA using 802.11g with 6-54 Mbps ratecontrolenabled...... 44

2.14 A classroom environment with 54 seats. Leaving the AP and one client fixed, we tested with a client placed on the desk in front of each chair...... 45

2.15 CDFofthroughputforclassroomtest...... 45

2.16 Performance evaluation on real and synthetic topologies...... 46

2.17 Throughput improvement under different channel fading conditions – Shuffle performs well under Rayleigh and Ricean fading...... 47

3.1 TCP download throughput contour on two floors of the same apart- ment. A hidden terminal is placed in an adjacent apartment (not shown). Removal of the interference provides 8-12 Mbps in the living room (versus ă1Mbpsshown)...... 56

3.2 As AP2 is moved away from the AP1ÑC1 link, graph shows decreasing and then increasing performance for AP1. AP2 becomes a hidden terminal at 5m, causing significant losses up to 35m away...... 57

3.3 As C2 moves towards its AP, it becomes less susceptible to hidden terminal interference from AP1. TCP more-fully utilizes the channel, and correspondingly, C1 is severely impacted by AP2...... 58

3.4 Hiddenterminalconditions...... 61

xvii 3.5 Timeline of wired token exchange and wireless timeslots. AP1 pur- chases timeslot t5 to t6 by giving the token to AP2. AP1 may not be able to transmit at t7 (due to some other partnership, not shown). AP1 abstains from a token pass at t7, allowing AP2 to transmit. How- ever,AP1silencesAP2att8instead...... 67

3.6 Rotating channel access rights, established by token exchanges across multiplepartnerships...... 68

3.7 (a) With TCP, RxIP provides a median 57% gain over 802.11 under symmetric hidden terminals. (b) RxIP extracts the majority of avail- able gain. (c) Despite the already-symmetric conditions, RxIP further improvesfairness...... 74

3.8 TCP throughput and fairness under asymmetric hidden terminals. (a) Coordination balances the asymmetry, closely approximating an ideal 50-50 channel share. (b) Fairness improves dramatically...... 76

3.9 RxIP protects the AP1-C1 link from performance degradation regard- lessofAP2position...... 76

3.10 As C2 moves from position 0 to 20m, its link strengthens, becoming less susceptible to hidden terminal interference from AP1. TCP more- fully utilizes the channel, and correspondingly, C1 is severely impacted byAP2. Coordinationprotectsbothlinks...... 77

3.11 (a) RTT between APs across an apartment complex using 1.5Mbps cable. (b) AP-to-client delivery latency exhibits a linear relationship to the Internet RTT between partnered APs (2x AP-to-AP delay). . . 78

3.12 (Inset) Intermediate APs relay clock offsets for time synchronization between hidden terminals. (Graph) Second-hop time synchronization error attributable to wired relay mechanism latency...... 79

3.13 Scalability test, 30 random 6-link topologies. CDF (a) throughput, (b)jitterand,(c)fairness...... 81

4.1 Shows experimental setup with Nexus One phone connected to power meter via copper tape and DC leads. The phone is entirely powered by the power meter, using the lithium battery only as ground. The computer, connected via USB, records current and voltage at 5000 hertz...... 90

4.2 (a) Screenshot from Monsoon power meter; (b) Power draw over time forPandoramusicstreaming...... 92

xviii 4.3 Energy consumed under bulk data transfer and YouTube replay with varying contention (i.e., increasing number of APs in the vicinity). . . 94

4.4 Proportion of time spent in each power level. (a) 8 MB TCP Iperf; (b)YouTubew/Tcpreplay...... 95

4.5 AP1 and AP3’s traffic maps during bootstrap (AP2’s map, not shown, is identical to AP1’s). The circle denotes one BEACON INTERVAL of 100ms. The ticks on the circle denote when an AP has overheard beacons from other APs, as well as the time of its own beacon. The traffic maps clearly depend on the neighborhood...... 97

4.6 APs 1, 2, and 3 migrate their traffic per the SleepWell heuristic. Over time, the beacons are spread in time, alleviating contention between APs...... 98

4.7 SleepWell APs distributedly stagger their beacons to reduce con- tention. Each AP preempts its traffic to honor another AP’s schedule. 100

4.8 (a, b) Two SleepWell clients converge to non-overlapping activity cy- cles, one sleeping when the other is active. (c) Under same experiment settings, 802.11 client stays awake for entire TCP download...... 108

4.9 Adjustment rounds until a SleepWell AP reaches a converged beacon placement...... 109

4.10 Overall energy performance of SleepWell...... 109

4.11 8 MB Iperf TCP download. With higher contention, SleepWell spends a larger fraction of time in light-sleep, whereas, 802.11 spends most of the time in the idle/overhear state (see Fig. 4.4a)...... 110

4.12 Proportion of time spend in each activity level with YouTube traffic. ComparetoFigure4.4...... 111

4.13 (a) Iperf, (b) YouTube, (c) Pandora. CDF comparison of in- stantaneous power showing that SleepWell better matches the zero-contentioncurve...... 112

4.14 CDF of instantaneous power consumption, YouTube with contention fromYouTubeclients...... 113

4.15 Bulkdatatransferon4AP/clienttestbed...... 114

4.16 Performance of beacon adjustment: (a) CDF of beacon separation; (b) separation by network density; (c) CDF of proportion of an AP’s traffic that can be satisfied before the end of its beacon share. . . . . 115

xix 4.17 TCP throughput on 4 AP/client testbed. Distribution reflects per-link goodputforalllinks...... 116

4.18 Per-packet latency on 8 AP/client testbed. Latency measured as 10 ICMP pings per second on one link, 7 others contend with TCP. . . . 116

4.19 SleepWell fairness: (a) TCP Jain’s fairness on 4 AP/client testbed. Note X-intercept at 0.9.; (b) Jain’s fairness for simulated beacon shares withunboundedtraffic...... 117

4.20 SleepWell performance by AP density: (a) rounds until convergence at 90th percentile; (b) median beacon separation; (c) beacon separation at5thpercentile...... 118

4.21 SleepWell performance by proportion of legacy APs: (a) rounds un- til convergence at 90th percentile; (b) median beacon separation; (c) beaconseparationat5thpercentile...... 119

5.1 CDF of ping latency between two phones on 3G HSDPA connectivity in Redmond, WA, either direct, or via a nearby University of Wash- ington server, or via the best server offered by geo-distributed Bing Search. Horizontalaxiscroppedat600ms...... 129

5.2 CDF of ping latency between two phones on 3G HSDPA connectivity in Durham, NC, either direct, or via a nearby Duke University server, or via a distant University of Washington server, or via the best server offered by geo-distributed Bing Search. Horizontal axis cropped at 600ms...... 130

5.3 Simplified architecture of a 3G mobile data network. RNC is a Radio Network Controller, and handles radio resource management. SGSN is a Serving GPRS Support Node, and it handles mobility manage- ment and authentication of the mobile device. GGSN is a Gateway GPRS Support Node, and interfaces with the general IP network that amobileoperatormayhaveandtheInternet...... 134

5.4 RTT from a phone in Princeville, HI on AT&T Wireless to the FRH. Each point is the median latency over 15 seconds. Graph is zoomed into a portion of the data to show detail. Data from Redmond, Seattle, Durham,andLosAngelesarevisuallysimilar...... 135

xx 5.5 RTT from a phone in Redmond, WA on AT&T Wireless to the FRH. On the horizontal axis, we vary the length of time window over which we calculate the latency at the various percentiles indicated by the different lines. On the vertical axis, we show the difference in ms be- tween two consecutive time windows at the different percentiles, av- eraged over the entire trace. Data from Princeville, Seattle, Durham, Los Angeles for AT&T Wireless are visually similar...... 136

5.6 RTT from a phone in Durham, NC on T-Mobile to the FRH. On the horizontal axis, we vary the length of time window over which we calculate the latency at the various percentiles indicated by the different lines. On the vertical axis, we show the difference in ms between two consecutive time windows at the different percentiles, averagedovertheentiretrace...... 136

5.7 For any given 15 minute time window, from how far back in time can we use latency measurements and still be accurate? The horizontal axis shows the difference in latency at the 95th percentile between a time window and a previous time window. The age of the previous time window is shown in the legend. The vertical axis shows the CDF across all the different 15 minute intervals in this trace. The horizontal axisisclippedontheright...... 137

5.8 CDF of Kolmogorov-Smirnov (KS) test Goodness-of-Fit P-values for successive time windows by window size (in minutes) for a phone in Redmond, WA on AT&T Wireless. Each data point represents a two- sample KS test using 100 points from each of two successive time windows. The percentage of null hypothesis rejection is shown as the intersection of a distribution with the chosen significance level. A lower percentage of rejected null hypotheses is an indication of greater stability across successive time windows. The horizontal axis is clipped on the left. For clarity, a limited set of window sizes are shown. Data from Princeville, Seattle, Durham, Los Angeles are visually similar. . 138

xxi 5.9 Impact of reducing the measurement sampling rate for a 15 minute window of latency from a phone in Durham, NC on AT&T Wireless to the FRH. The horizontal axis shows the difference in latency at the specified percentile between using a measurement rate of once per 1 second and using a measurement rate of once per 5 to 600 seconds as indicated. The vertical axis shows the CDF across all the different 15 minute intervals in this trace. Note that at a sampling rate of once per 90 seconds for 15 minutes, we have only 10 samples and hence we cannot calculate the 95th percentile. The horizontal axis is clipped on the right. Data from Princeville, Redmond, Seattle, Los Angeles are visuallysimilar...... 139

5.10 Maps showing measurement locations in (a) the Seattle area of Wash- ington, (b) the Redmond area of Washington, (c) the Durham and RaleighareasofNorthCarolina...... 142

5.11 Difference in latency between a stationary phone at “S-home” and a phone placed at a variety of locations in Seattle. Each line is a CDF of ((xth percentile latency over a 15-minute interval from stationary phone at “S-home”) - (xth percentile latency over the same 15-minute interval for the other phone at the location in the legend)) computed for all possible 15-minute windows, in 1 minute increments. The xth percentile is 50th for the top graph, 90th for the middle, and 95th for the bottom. Horizontal axis is cropped on the right...... 143

5.12 Difference in latency between a stationary phone at “M-home” and a phone placed at a variety of locations in Redmond. Each line is a CDF of ((xth percentile latency over a 15-minute interval from sta- tionary phone at “M-home”) - (xth percentile latency over the same 15-minute interval for the other phone at the location in the legend)) computed for all possible 15-minute windows, in 1 minute increments. For conciseness, we present only the 50th percentile graph. Horizontal axisiscroppedontheright...... 144

5.13 Difference in latency between a stationary phone at “R-home” and a phone placed at a variety of locations in Durham. Each line is a CDF of ((xth percentile latency over a 15-minute interval from stationary phone at “R-home”) - (xth percentile latency over the same 15-minute interval for the other phone at the location in the legend)) computed for all possible 15-minute windows, in 1 minute increments. For con- ciseness, we present only the 50th percentile graph. Horizontal axis is croppedontheright...... 144

xxii 5.14 CDF of RTT between a phone in Durham, NC and a phone in San Antonio, TX. Component latencies involving the respective FRH are also included. The FRH to FRH latency is calculated by the difference of pings. The horizontal axis is clipped on the right. Note that the phone-to-phone CDF is not a perfect sum of the other three CDFs due to small variations in latency in between traceroute packets issued at therateofoncepersecond...... 146

5.15 ArchitectureofSwitchboard ...... 148

5.16 C# client API of Switchboard as would be used in a hypothetical game called “Boom”. For brevity, base class definitions are not shown here...... 149

5.17 CDF of number of players in each group after grouping 50,000 play- ers split into buckets of 1,000 players each, with a latency limit of 250ms. The top four lines show results from grouping players based on geographic proximity, while the bottom line uses latency proximity. 158

5.18 CDF of number of players in each group after grouping 50,000 play- ers split into buckets of 1,000 players each, with a latency limit of 400ms. The top four lines show results from grouping players based on geographic proximity, while the bottom line uses latency proximity. 158

5.19 CDF of number of players in each group after grouping 50,000 players split into buckets of varying sizes, with a latency limit of 250ms. The “QT 500” line shows results with QT clustering on a bucket size of 500 players. The “Hier 1500” line shows results with hierarchical clustering onabucketsizeof1,500players...... 160

5.20 Runtime of grouping algorithms for grouping 50,000 players split into buckets of varying sizes, with a latency limit of 250ms. The “QT” bars on the left show results with QT clustering, while the “Hier” bars on the right show results with hierarchical clustering...... 160

5.21 Aggregate client-to-server bandwidth by client Poisson arrival rate for Switchboard running on Azure. The first 15 minutes reflects a warm- ing period with elevated measurement activity as the server builds an initialhistory...... 163

5.22 CDF of ICMP probes per client at different client Poisson arrival rates, as conducted by the Measurement Controller in Switchboard running on Azure. Data reflects hour-long experiments and exclude warming period...... 163

xxiii 5.23 CDF of resulting group sizes at different client Poisson arrival rates. Grouping uses 500-client buckets. Data reflects hour-long experiments andexcludewarmingperiod...... 164

5.24 Client time spent in measurement and grouping. Measurement re- flects the time from when a client joins a lobby until there is sufficient data for the client’s tower. Time required for grouping reflects the total time from when measurement data is sufficient until the client is placed into a viable group (one or more clustering attempts). Group- ing performed with randomized buckets of up to 500 clients...... 165

6.1 An architectural overview of the OPS system – inputs from computer vision combined with multi-modal sensor readings from the smart- phoneyieldtheobjectlocation...... 173

6.2 Compass-based triangulation from GPS locations px1,y1q, px2,y2q to object position pa, bq...... 176

6.3 The visual angle v relates the apparent size s of an object to distance d fromtheobserver...... 178

6.4 Visual Trilateration: unknown distances from GPS locations px1,y1q and px2,y2q to object position pa, bq are in a fixed ratio d2{d1. .... 178 6.5 Visual Triangulation: fixed interior angle from known GPS location px1,y1q to unknown object position pa, bq to known GPS position px2,y2q ...... 179 6.6 Intersection of the four triangulation curves for known points p0, 0q and p10, ´4q, localized point p4, 8q, distance ratio σ “ 6 p5q{4 p5q“ 1.5, and internal angle γ “ 2 ¨ arctanp1{2q« 53˝...... a a 180 6.7 OPS builds on triangulation and trilateration, each underpinned by computer vision techniques, and multi-modal sensor information. The noise from sensors affects the different techniques, and makes merging difficult...... 181

6.8 Two vantage points of the same object of interest. The “thought- bubbles” show the two different perspective transformations, each ob- servingthesamefourfeaturecornerpoints...... 182

6.9 Example of a 3D point cloud overlaid on one of the images from which itwascreated...... 184

xxiv 6.10 Sampled tests; circle denotes object-of-interest (top), Earth view (bottom): (a) smokestack of a coal power plant; (b) distant building with vehicles in foreground; (c) stadium seats near goal post. 197

6.11 CDF of error across all locations. Graph reflects four photos taken perlocation.50locations...... 198

6.12 OPS and triangulation error at 50 locations. Graph reflects four pho- tostakenperlocation...... 198

6.13 Error from ground-truth GPS camera locations. X-axis shows the standard deviation of introduced Gaussian GPS errors. Bars show median error; whiskers show first and third quartiles...... 199

6.14 Error from ground-truth GPS camera locations. X-axis shows the standard deviation of introduced Gaussian compass errors. Bars show median error; whiskers show first and third quartiles...... 199

6.15 OPS error by photo resolution. Keypoint detection is less reliable below1024x768pixels...... 200

7.1 Clients at a university cafe exhibit varied dwell times, reflecting multi- ple patterns of user behavior. Some long-dwell clients study for hours whilemoremobileuserstakeamealto-go...... 213

7.2 Periodic Sensor-Feature matrices feed the SVM sub-predictors to gen- erate short-term predictions. These time-indexed predictions form a growing sequence that are then used to predict the user’s long-term dwell time behavior. Sequences from other users are used as the train- ingset...... 217

7.3 Difference between WiFi and 3G TCP throughput at different hotspots. WiFi offers almost 6.5ˆ throughput compared to 3G. . . . 220

7.4 BTG prioritizes traffic of short-dwell mobiles. However, it compen- sates long-dwell laptops by exploiting slack periods...... 221

7.5 ToGo synthesizes client sensor feedback to estimate dwell duration for associated clients. Applications such as BTG can leverage these predictions as necessary, for example, ensuring that multiplayer games will complete before one party leaves or providing prioritized access tocloudlets[163]...... 222

xxv 7.6 Cross-validation on 15 real-user traces at the Cafe. Despite only 14 SVM training points, ToGo correctly classified users within 2.5 min- utes. Additional sensors reduce prediction error during convergence...... 226

7.7 Diagram shows user behavior along a representative path. User (i) walks up to McDonald’s to examine wall-mounted menu and wait in queue line (10-60 seconds); (ii) places order, waits for food (1-2 minutes); (iii) takes condiments (2-15 seconds); (iv) sits and eats food (5-15 minutes); (v) discards trash (1-10 seconds); and (vi) exits to lobby.227

7.8 Mean priority misprediction at 3 hotspots: (a) McDonalds; (b) Library; (c) Cafe. All ToGo variants perform better than Naive. NoFeedback performs reasonably well when there is enough RSSI diversity as in a large such as library...... 228

7.9 Prediction accuracy by priority class. Dwell duration (X-axis) is dif- ferent for each class (increasing by class number). Naive requires sub- stantially longer before convergence to the correct classification. . . . 229

7.10 Relative overlap of emulated user behaviors, live experiments. . . . . 232

7.11 Performance of BTG with live traffic shaping in Cafe. Traffic shaping benefitsclientswithshorterdwelltimes...... 232

7.12 3G data saved per hour by one AP. BTG prioritization improves WiFi utilization, providing substantial 3G network savings. Gains increase with larger HD files. Note that in some cases, RSSI based NoFeedback variant suffices to differentiate short dwelling users...... 233

8.1 An illustrated sequence of operations. Let H denote a cryptographic hash function and Expmq denote the encryption of message m with key x. Encounter keys x and y hash to the same value, leading the server to relay Expmq to participants in both encounters. However, only participants with key x can recover message m. A timestamp t nonceinthereplypreventsreplayattacks...... 244

8.2 In online missed-connections posting services (such as Craigslist), posting subjects are forced to manually browse up to hundreds of unrelated postings. By directly routing messages to encounter participants, SMILE is more efficient and less error-prone...... 245

8.3 Wireless encounter-key broadcasts provide co-located users with shared state that can later be used to prove participation in an encounter...... 246

xxvi 8.4 Classification of identity confirmation checks requested, among Craigslist posts requesting some check. Most checks rely on features observable to (and thus forgeable by) third parties, such as a personal description...... 248

8.5 Estimated Craigslist encounter distance. Only «5% of encounters occuroutsideofBluetoothrange...... 258

8.6 Estimated latency from time of encounter occurrence to Craigslist post...... 259

8.7 Distributed scheme operation. During an encounter, each peer shares k identifiers and an encounter key. Messages are sent using onion routing or an anonymous remailer to preserve anonymity...... 261

8.8 Estimated encounter duration implied by Craigslist posts, by geo- graphiclocale...... 265

8.9 Encounter-key discovery. Each detection scan begins 15 seconds after thecompletionofthepriorscan...... 267

xxvii List of Abbreviations and Symbols

3G Third Generation; standard for cellular telecommunications

802.11 IEEE standard for WLAN

s Microsecond; one millionth of one second (1{106)

AC Alternating Current; compare to DC

ACK Acknowledgement; especially IEEE 802.11 ACK or TCP ACK

ACM Association for Computing Machinery

AP Access Point; defines a single BSS

BSS Basic Service Set; single AP plus associated STA

BSSID Basic Service Set Identifier; unique ID for a BSS

CAM Constant Awake Mode; STA not using PSM

CBR Constant Bitrate

CSMA Carrier Sense Multiple Access

CDF Cumulative Distribution Function

CTS Clear to Send; see RTS

DC Direct Current; compare to AC

DCF Distributed Coordination Function; compare to PCF

DHCP Dynamic Host Configuration Protocol

DOCSIS Data Over Cable Service Interface Specification

DSL

DTIM Delivery Traffic Indication Message

xxviii EWLAN Enterprise Wireless Local Area Network

FIFO First In, First Out

FPS Frames per Second

FTP File Transfer Protocol

Gbps Gigabits per Second; exactly one billion bits in one second

HD High Definition

HT Hidden Terminal

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

J Joule

MB Megabyte;220 bytes, roughly one million bytes

Mbps Megabits per Second; exactly one million bits in one second

MCS Modulation and Coding Scheme

MD5 Message-Digest Algorithm 5

MIM Message in Message

MIMO Multiple-Input and Multiple-Output

ms Millisecond; one one-thousandth of one second (1/1000)

MTU Maximum Transmission Unit; 1500 bytes for Ethernet

mW Milliwatt; one one-thousandth of one watt (1/1000)

NAT Network Address Translation

NP Non-deterministic Polynomial-time

OS

PCF Point Coordination Function; compare to DCF

PS or PSM Power Save Mode; compare to CAM

xxix QAM Quadrature Amplitude Modulation

RF Radio Frequency

RSSI Received Signal Strength Indication

RTS Request to Send; see CTS

RTT Round-Trip Time

RWLAN Residential Wireless Local Area Network

RX Reception

s Second

SINR Signal to Interference plus Noise Ratio

SNR Signal to Noise Ratio

SoI Signal of Interest

STA station; client device in IEEE 802.11 WLAN

TCP Transmission Control Protocol

TIM Traffic Indication Message

TDMA Time Division Multiple Access

TSF Time Synchronization Function

TU Time Unit; equivalent to 1.024 ms

TX Transmission

UDP User Datagram Protocol

USB Universal Serial Bus

UTW Upload Time Window

WEP Wired Equivalent Privacy

Wi-Fi Wireless Fidelity; an IEEE 802.11 WLAN

WPA Wi-Fi Protected Access

WLAN Wireless Local Area Network

xxx Acknowledgements

I am deeply indebted to many: To my advisor, Romit Roy Choudhury, who has been an endless source of pa- tience, support, and encouragement. I am grateful for his commitment to my in- struction, for a courteous and constructive manner of critique, and for honest and thoughtful guidance in all facets of the research process and my early career. To the Duke University Computer Science and Electrical and Computer Engi- neering faculty, and especially, my committee members Jeff Chase and Landon Cox, for generosity of time and support. To my committee member Victor Bahl, for the unique and broadening perspec- tives that he has brought to my work and understanding of the discipline. To many others at Research, especially Sharad Agarwal and Ming Zhang. To my classmates, lab-mates, and friends, for their camaraderie. To, Souvik Sen, in particular, for whom no question is too foolish and no request too burdensome. To my other co-authors, Naveen Santhapuri, Peter Franklin, Xuan Bao, Puneet Jain, Srihari Nelakuditi, and Kamesh Munagala. To the faculty of The College of William and Mary and of The National Institute of Standards and Technology who encouraged and prepared me to pursue this degree. To my wife, Jane, for her unconditional love, faith, and sincerity. To her family, for welcoming me with warmth, respect, and without reservation. To my family, for earnest support of all I endeavor.

xxxi 1

Introduction

Mobile computing has reached true ubiquity. By 2015, analysts predict the existence of one mobile device per capita, worldwide [47]. Today, 48 million people carry a mobile phone—without even basic electrical service to their homes. usage is currently doubling, year-over-year. While far from perfect, in our homes and in the enterprise, in our leisure and in our professions, mobile computing is no longer merely “exciting;” it has become an essential, ubiquitous tool of the modern world. Quickly, mobile devices have reached this exalted position; the immediate value of mobile computing continues to spur an extraordinary growth. Over the longer term, a technological metamorphosis will be required to keep pace, not only to sa- tiate the demands of current brisk growth, but to cultivate a continued enthusiasm. Today, mobile application (app) developers capitalize on the inherent advantages of the platform: immediacy of communication, a sufficient set of sensory capabilities to provide localized and environmentally-contextualized information and interaction, and a device kept in close proximity of its user at almost all times. When effec- tively leveraged, these advantages synergize to form instantaneous links between a

1 user’s physical person and digital presence; the user can interact with the meaningful content sources relevant to the human context in which the device is used. However, even in the relative infancy of mobile computing, apps are too quickly becoming repetitive. They are already exhausting the “low-hanging fruit,” and be- ginning to stagnate. This is evident when browsing Apple’s iOS “App Store” or the Google “Marketplace” for Android devices: though there are hundreds of thou- sands of applications available for immediate download and installation, only a small number can truly be described as innovative. Instead, in late part, we are left with mobile applications that either seek to imitate the functionality of their desktop counterparts, albeit with a shrunken display and resource constraints, or make only meager attempts to integrate the unique capabilities of the platform. For example, while many toys and simple games make use of raw sensor inputs independently (e.g., accelerometers to control movement in virtual spaces), few leverage the poten- tial for joint, multimodal sensing to create a comprehensive awareness of the user’s environment. Further, given the personal nature of these devices, kept close-at-hand at almost all times, location-based and social-networking services often trivially in- clude a user’s GPS location as an input to search queries. However, they ignore the great potential for deep contextual understand of a user’s true intentions, es- pecially by leveraging machine learning on historical patterns of behavior. At the same time, such powerful tools should not be used without care, they may escalate the privacy and security pitfalls of using a mobile device. Unique mobile capabilities can also be exploited to commensurately raise the bar of intrusion, and provide a more-secure mobile platform. Few apps fully exploit the intrinsically-powerful con- vergence of computation, communication, and sensing on the modern mobile device. Tomorrow’s apps must reach a greater level of sophistication to meet our myriad expectations for mobile technology.

2 This dissertation seeks to define novel primitives to facilitate advanced mobile ap- plication development. By providing developers a suite of useful and powerful “build- ing blocks,” we can provide a fundamental enhancement to the user experience of mobile computing: in re-optimizing or re-architecting our networking infrastructure for greater performance and reliability; in improving the battery life of mobile devices by enhanced efficiency; in enabling new ways to inform, entertain, and surprise users with novel and compelling applications; and through protection mechanisms which treat the privacy of mobile data sources commensurate to their increasing sensitivity. Taken together, such new foundations can enable tomorrow’s apps to bring the value of mobile technology beyond the expectations of early-adopters, and into seamless integration with all of our lives. Each building block can take one of two forms, either a substantial optimization to an existing and significant primitive, or an enabling technology for a new and useful capability. This dissertation takes a view that both types of innovation are crucial for advancing the complete experience of mobile computing. To a certain extent, it is only sensible to spend effort optimizing our mobile computing primitives insofar as they are limiting to the mobile experience that our applications today can provide. If the next generation of applications are not innovative enough to create new challenges, the effort may be wasted. On the other hand, if we design applications that push the foundations to their breaking points, our designs may not be practical. Corresponding to this tradeoff, this dissertation is organized in two distinct-but- complementary content areas. First, we take a bottom-up view. The quality of underlying wireless communication is a key requirement for many (if not most) mo- bile applications. In three chapters, we show how pragmatic enhancements to the wireless link layer can improve the perceived quality of network-dependent mobile applications, through improved performance, reliability, and energy efficiency. Next,

3 Tomorrow's Mobile App Store

Enabling Middleware for Next Generation Mobile Applications Switchboard OPS BytesToGo SMILE Chapter 5 Chapter 6 Chapter 7 Chapter 8

Part II, Application Driven Design

Part I, Bottom-up Enhancements for Wireless Network Management

Shuffle RxIP SleepWell Chapter 2 Chapter 3 Chapter 4

Wireless Computation Sensing Communication

Figure 1.1: Building blocks by design approach. In the first half of the dis- sertation, we take a bottom-up perspective, seeking to optimize existing wireless communication for enhanced performance, reliability, and energy efficiency. In the second half, we take an application-driven top-down approach, considering possible future mobile applications and the supporting primitives required to enable them. we consider a top-down approach, and consider a suite of such applications. In each of four chapters, we will present an application-driven design, leading to a critical building block that can ultimately enable a wider class of mobile applications. In particular, these building blocks will provide support for latency-sensitive operation across cellular networks, a novel understanding of localization that considers the relative positions of visible objects in a user’s vicinity, a framework for learning, pre- dicting, and adapting to human micro-mobility patterns, and a mechanism to provide greater security and privacy in mobile-social services. Figure 1.1 summarizes the two complementary design strategies, and how each building block lays a foundation for future mobile applications. Across these two content areas, this dissertation presents a selection of building blocks which, when considered as a collective whole, can be a fundamental enhance- ment to to way in which we experience mobile technology. The diversity of these

4 building blocks is absolutely key. This dissertation takes a view that by bringing together nontrivial enhancements across a selection of disparate-but-interrelated do- mains, the impact is synergistically stronger than the sum of each in isolation. Push- ing on one aspect, ignorant of the rest, is not enough for a substantial impact in how mobile technology integrates with our day-to-day lives. For example, this dissertation considers improvements to wireless network performance. However, the networks we have today are roughly sufficient, or at least nearly so, for the applications we use today. Yes, we can eliminate headaches, as much of this dissertation seeks, but we can only go so far by addressing performance alone. At some point, for example, our battery life becomes a more compelling concern. But, instead as we consider both, taking a balanced approach at addressing a wider cross-section of problems across the platform, the entire experience can become more satisfying. Thus, there is a synergy in taking this wider view. This dissertation takes this view further. At a certain point, improving the network becomes meaningless if it is sufficient for the applications we wish to run. But, if we continue to define novel, compelling cloud applications that demand more of our networks, those improvements can become truly valuable, and in turn, the improvements make the applications far more viable. The broad collection of enhancements described in this dissertation is reflective of and a response to the broad set of challenges faced in mobile computing: including network performance, battery life, the instability of peer-to-peer overlays, localiza- tion, characterizing and predicting human behavior, and privacy. Instead of targeting a particular emerging class of application, it attempts to plug holes widely in the set of challenges mobile application developers face. While developers have been able to overcome small hurdles and exploit some of the promise of mobile platforms, the industry falls far short due to the disparate challenges of platform. By alleviating some of the important challenges, we present developers with a strengthened frame- work on top of which many interesting applications can be developed, and abstract

5 away the fundamental difficulties. This framework can then be broadly enabling; developers can work at a higher level and focus on the unique opportunities that their applications provide. The greater the value of the mobile platform (determined by the sum of value from all of the applications we use) is only multiplied when we eliminate underlying aggravations. If we enable an application as a cloud service, certainly improving the network bandwidth to serve the application improves the user perception of the application. If we can provide more-robust privacy solutions, we may be more willing to author social content with a system like OPS. If we can improve the quality of end-to-end game latency, we might care that we can predict when a user is likely to move to a new environment, and potentially disrupt an ongoing game. If we can extend our battery life, we may be willing to engage longer in all such applications without fear of losing too much talk time. Again, by bringing together nontrivial enhancements across a selection of disparate-but-interrelated domains, the impact is synergistically stronger than the sum of each in isolation. Both from bottom-up and top-down perspectives, this dissertation targets the design, implementation, and experimental evaluation of novel-yet-pragamtic system- building primitives, ready to enable the next generation of mobile apps. In this view, where possible, each research prototype will exploit readily-available commer- cial off-the-shelf components, comply with relevant standards bodies, be designed with careful consideration for tradeoffs between immediate practicality and clean- slate elegance, and be subjected to live, end-to-end evaluation.

Part I, Bottom-up Enhancements for Wireless Network Management

Increasingly, mobile applications need to exploit resources that exist outside of the mobile devices itself. In principle, this requirement is not new. Of course, the pri- mary function of a smartphone has always been to provide wireless communication,

6 for both voice and data. However, the demand increases with a growing desire to offload complex computational tasks, provide access to external databases, for timely distribution of live content sources such as social networks, and to serve increasingly high-bandwidth multimedia. As deployed today, IEEE 802.11 wireless networks (Wi-Fi) typically consist of fixed and mobile components. Mobile clients, such as laptops and smartphones receive network connectivity through associations with one or more access points (APs). Clients face a number of challenges in self-optimizing their experience using wireless networks including (1) a dynamic, transient perspective of the network due to user mobility; (2) an inability to communicate preferences or problems to other nodes, except through wireless links; (3) and a heavily-constrained energy profile to meet stringent battery life requirements. In contrast, wired infrastructure, especially network APs, is far less constrained. Permanent placement at key locations within the wireless network, access to wired backhaul links, and available AC power can enable APs to develop an awareness of time-varying wireless network conditions — those that would be difficult or costly for clients to deduce independently. In turn, this internal characterization of wireless conditions can enable APs to make informed decisions as to when wireless packet delivery is most suitable. Clients can benefit from (1) improved peek performance, in terms of throughput and latency; (2) greater resilience to high-loss fault conditions; (3) reduced battery strain from network activities. The research community has shown interest in the potential for intelligent AP designs [44, 170, 170, 139, 22, 37, 11]. Manufacturers of commodity networking equipment are also becoming aware, and have begun to partially leverage their ad- vantaged view of network conditions [151, 46, 127]. However, many avenues for novel infrastructure-centric wireless network enhancements have been left unexplored. This dissertation will present three such systems, Shuffle, RxIP, and SleepWell. Although

7 they represent distinct techniques and end goals, they are related in exploiting the natural advantages APs have in effectively directing wireless operations. Moreover, each of these systems leverage unique coordination mechanisms to optimize opera- tions on network-wide and inter-network scale. Each of these systems attempts to improve the quality of service provided to wireless clients in light of a shared, contentious, wireless channel. With client and AP density rising and per-device traffic loads increasing, wireless channel access is becoming a scarce resource. Shuffle considers synchronized allocation of channel time to maximize peak performance. RxIP isolates channel access among interfering APs when problematic conditions persist. SleepWell desynchronizes client channel access to reduce the energy cost of contention. Carrier Sense Multiple Access (CSMA) wireless networks, such as WiFi, control channel access timing in a haphazard fashion, where individual nodes compete for channel access when and where it is desired. Shuffle, RxIP, and SleepWell share a notion of improving CSMA efficiency by exerting some explicit timing control, albeit each at different granularity. Shuffle operates at the precision of microsec- onds, modulating the exact order at which a pair of near-synchronous, concurrent packet transmissions begin. RxIP operates on the scale of milliseconds, creating iso- lated time slots during which individual APs receive exclusive channel access rights. SleepWell manipulates client duty-cycling behavior on the scale of hundred of mil- liseconds, ensuring that clients sleep through periods of heavy contention. From a design perspective, Shuffle, RxIP, and SleepWell may be placed on a continuum from a heavier-weight and most restrictive assumptions to least invasive and most flexible in regard to interoperation with legacy devices. Shuffle is a clean- slate protocol design and assumes new wired infrastructure to orchestrate wireless operations. RxIP uses the Internet in an attempt to bring some of the benifits of wireline control of wireless networks, as explored in Shuffle for enterprise single-

8 operator environments, to the context of chaotic home networks. Unlike Shuffle, it does not break existing IEEE 802.11 standards, and interoperates with existing commodity equipment. To mitigate network instabilities, RxIP creates a coarsely- scheduled wireless channel access pattern across multiple APs. This type of network- wide scheduling can help to reduce significant energy costs to associated mobile clients. SleepWell seeks to exploit the energy gains of network-wide AP scheduling, but with light-weight, highly-interoperable coordination mechanisms. Across three chapters, we describe the design, implementation, and evaluation of these systems in detail.

Expanding Capacity by Reordering Transmissions in Wireless Networks

With a growing trend towards all-wireless enterprise networks, escalating demand for mobile data has pushed corporate WiFi networks to their breaking points. Now, mainstream enterprise WiFi networking infrastructure represents a radical departure from traditional deployments, featuring extreme centralization and active manage- ment. We leverage this new centralized architecture, and a new opportunity for cross- layer optimization, to provide a substantial increase in network capacity. Specifically, modern wireless interfaces support a physical layer capability called Message in Mes- sage (MIM). Briefly, MIM allows a receiver to disengage from an ongoing reception, and engage onto a stronger incoming signal. Wireless links that otherwise conflict (strongly interfere) with each other can be made concurrent with MIM, by initiating transmissions in a specific order. We design Shuffle, an enhanced network system to exploit the opportunity in MIM-aware reordering, yielding dramatic throughput improvements.

9 Monitoring the Health of Home Wireless Networks

As mobile technology becomes increasingly pervasive, the need for enterprise-levels of performance and stability is no longer limited to the enterprise itself. In the home, deploying a WiFi access point (AP) or wireless router can be very hard. Untrained users typically purchase, install, and configure a home AP with little awareness of wireless signal coverage and complex interference conditions. We envision a future of autonomous wireless network management that uses the Internet as an enabling technology. By leveraging a peer-to-peer (P2P) architecture over wired Internet connections, nearby APs can coordinate to manage their shared wireless spectrum, providing a robust user experience even in the least-favorable home deployment sce- narios. As a specific instance of this architecture, we design RxIP, a network di- agnostic and recovery tool. We believe that RxIP is extensible to a wide variety of network enhancements, opening a rich area for future research, and helping to ensure that smarter homes of the future embed smarter networks.

Network Support for Energy Management on WiFi Smartphones

Insufficient battery life has become the scourge of many users of mobile technol- ogy. Simply, a phone which cannot remain powered on throughout its owner’s day has failed to meet its most basic requirements – to provide always-accessible com- munication. This chapter explores a new way to improve the energy-efficiency of communication, and thus battery life, especially while using WiFi. WiFi (contention) among different network access points (APs) can dramati- cally increase a client’s energy consumption. Each client may have to keep awake for long durations before its own AP gets a chance to send packets to it. As the AP density increases in the vicinity, the waiting time inflates, resulting in a proportional decrease in battery life. We design SleepWell, a system that achieves energy effi- ciency by evading inter-AP network contention. Our prototype provides immediate,

10 no-change compatibility with all WiFi devices, yielding up to a doubling of battery life under real-world traffic loads for the latest Android-based smartphones.

Part II, Application-driven Design of New Supporting Primitives

Today’s embodiment of mobile computing, the smartphone, is the ultimate con- vergent IT platform. It combines the power of computation, communication, and sensing embedded in situ within the daily life of its owner. Correspondingly, the field of mobile computing requires a consideration of the challenges of the many different domains that can exploit the flexibility of the device. While many challenges are common across much of mobile computing, such as the need for highly-reliable wire- less networking and battery-saving designs, it can be instructive to consider those challenges specific to the individual needs of tomorrow’s sophisticated applications. In the second half of this dissertation, we consider the unique pitfalls across a dis- parate set of novel applications. An application-driven perspective can shed light upon the corresponding primitive techniques required to support these applications, and ultimately, enable wider classes of related applications.

Interactive Multiplayer Mobile Gaming over Cellular 3G

Both in the home and out in our daily activities, mobile devices are increasingly exploited as a convenience for recreation. Consequently, there has been a recent explosion of single-player and turn-based multiplayer games on mobile phones. Al- though popular on other platforms, supporting interactive multiplayer gaming on mobile phones, especially over cellular networks, is a difficult problem. End-to-end performance is highly variable, and the quality of a game experience is closely tied to tight latency requirements. We build Switchboard, a scalable service to quickly characterize the dynamic properties of each cellular link and then provide

11 matchmaking support for mobile games – assigning players to viable game sessions so that end-to-end latency requirements are met.

Accurate Localization for Distant Objects, a Multi-sensory Approach

Of course, mobile phones are not only useful for the exploration of strictly virtual spaces. A wide class of augmented reality applications exploit local sensing and content from the Internet to provide extrasensory views of the user’s surrounding (real) world. A key requirement is awareness of location. We design OPS, a sys- tem that can provide accurate localization, not only for the user, but for buildings or other objects in the distance. Unlike existing “landmark-recognition” systems, OPS leverages multimodal sensing and techniques borrowed from computer vision to infer the location of less-significant, undocumented objects. OPS will enable true augmented reality, with users able to create, query, and display content regarding anything within their visible surroundings.

Sensing on Smartphones to Learn, Predict, and Adapt to Natural Human Behavior

Sensing on modern smartphones enables high-resolution measurement of the user’s behavior. One possibility is to predict how long someone will remain at a WiFi hotspot, her dwell time. We design a framework for collecting and learning from relevant sensor data to provide live predictions of dwell time. Many new applica- tions can be enabled. APs may aggressively prioritize clients that will soon depart, enabling downloads to complete. Multiplayer games can be established such that all peers are predicted to stay associated to a WiFi hotspot long enough for the game to finish. Shopkeepers may be able classify their customers as they browse the aisles, enabling targeted discounts.

12 Encounter-based Trust for Mobile Social Services

Such detailed measurement of a user’s behavior is not without risk. Security and privacy must be treated as first-class design requirements for mobile systems. In conventional mobile social services, participants typically trust a centralized server to manage their location information and trust between users is based on existing social relationships. Unfortunately, these assumptions are not secure or general enough for many mobile social scenarios. We design SMILE, a privacy-preserving “missed- connections” service in which the service provider is not relied upon to preserve data confidentiality and users are not assumed to have pre-established social relationships with each other. Instead, trust is founded solely on anonymous users’ ability to prove to each other that they shared an encounter in the past.

13 2

Transmission Reordering in Wireless Networks

Modern wireless interfaces support a physical layer capability called Message in Mes- sage (MIM). Briefly, MIM allows a receiver to disengage from an ongoing reception, and engage onto a stronger incoming signal. Links that otherwise conflict with each other, can be made concurrent with MIM. However, the concurrency is not imme- diate, and can be achieved only if conflicting links begin transmission in a specific order. The importance of link order is new in wireless research, motivating MIM- aware revisions to link scheduling protocols. This chapter identifies the opportunity in MIM-aware reordering, characterizes the optimal improvement in throughput, and designs a link layer protocol for enterprise wireless LANs to achieve it. Testbed and simulation results confirm the performance gains of the proposed system.

2.1 Introduction

Physical layer research continues to develop new capabilities to better cope with wireless interference. One development in the recent past is termed Message in Message (MIM). Briefly, MIM allows a receiver to disengage from an ongoing signal reception, and engage onto a new, stronger signal. If the ongoing signal was not

14 intended for the receiver (i.e., interference), and if the new signal is the actual signal of interest (SoI), then re-engaging onto the new signal is beneficial. What would have been a collision at a conventional receiver may result in a successful communication with MIM-capable hardware. For a better understanding of MIM, we contrast it with the traditional definition of collision. More importantly, we differentiate MIM from the existing notion of physical layer capture. Collision was widely interpreted as follows: A signal of interest (SoI), however strong, cannot be successfully received if the receiver is already engaged in receiving a different (interfering) signal. Most simulators adopt this approach, pronouncing both frames corrupt [6, 164]. If, on the other hand, the SoI arrives before the interference, and satisfies the required SINR, the signal can be successfully decoded. The figure below shows the two cases.

Signal Signal

Interference Interference

Success at Collision at any SINR SINR requirement

Physical Layer Capture was later understood through the systematic work in [98, 141]. Authors showed that capture allows a receiver to decode a later-arriving SoI, provided the start of both the SoI and the interference are within a preamble time window. The figure below illustrates this. While valuable in principle, the gain from capture is limited because the 802.11 preamble persists for a short time window (20 s in 802.11a/g/n). If the SoI arrived later than 20 s, both frames will be corrupt. Message in Message (MIM) is empowering because it enables a receiver to decode an SoI, even if the SoI arrives after the receiver has already locked on to the interference [101]. Of course, the required SINR is higher for re-locking onto the

15 Signal Signal Interference Interference Capture: success at Collision at any SINR high SINR req.

new signal. Conversely, if the SoI arrives earlier than the interference, the reception with MIM-capable hardware is same as traditional reception. The following figure illustrates the MIM advantage.

Signal Signal Interference Interference Capture: success at MIM: success at high SINR req. high SINR req

To summarize, unlike traditional receivers, an MIM-capable receiver can decode a strong signal of interest even if it arrives later than the interference. Of course, the required SINR to decode the later packet is relatively higher («10 dB) compared to when it arrives earlier («4 dB). What makes MIM feasible? An MIM receiver, even while locked onto the in- terference, “simultaneously searches” for a new (stronger) preamble. If a stronger preamble is detected (based on a high correlation of the incoming signal with the known preamble), the receiver unlocks from the ongoing reception, and re-locks on to this new one. The original signal is now treated as interference, and the new signal is decoded. The ability to extract a new signal, even if at a higher SINR, can be exploited to derive performance gains. We motivate the opportunity with an example.

16 Controller R1 Data SUCCESS R2 Data SUCCESS

AP1 AP2 R1 Data COLLISION R2 Data SUCCESS

SINR = 5dB R1 R2 SINR = 11dB

Figure 2.1: AP1ÑR1 must start before AP2ÑR2 to ensure concurrency. If AP2 starts first, R1 locks onto AP2 and cannot re-lock onto AP1 later.

Link Layer Opportunity

Consider the example in Figure 2.1. For R1, AP1 is the transmitter, while AP2 is the interferer (and the vice versa for R2). When using MIM receivers, observe that the two links can be made concurrent only if AP1ÑR1 starts before AP2ÑR2. Briefly, since AP2ÑR2 supports a higher SINR of 11 dB, it can afford to start later. If that is the case, R2 will begin receiving AP1’s transmission first, and later re-lock onto AP2’s new signal which is more than 10 dB stronger than AP1’s. However, in the reverse order, R1 will lock onto AP2’s signal first, but will not be able to re-lock onto AP1, because AP1’s signal is not 10 dB stronger than AP2’s (it is only 5 dB stronger). Therefore, R1 will experience a collision. As a generalization of this example, MIM-aware scheduling protocols need to initiate weaker links first, and stronger links later. Appropriate ordering of the links can improve spatial reuse. In a larger network, choosing the appropriate set of links from within a collision domain, and determining the order of optimal transmission, is a non-trivial research problem. IEEE 802.11 is unable to ensure such orderings, failing to fully exploit MIM-capable receivers. Perhaps more importantly, graph coloring based scheduling approaches are also inapplicable. This is because graph coloring assumes symmetric conflicts between links. MIM link conflicts are asymmetric (i.e., depend on relative

17 order), and may not be easily expressed through simple abstractions. To address this problem, we propose Shuffle, an MIM-aware link layer solution that reorders transmissions to extract performance gains. Our main contributions are: (1) Identifying the opportunities with MIM. We use MIM-enabled Atheros 5213 chipsets, running the MadWiFi driver to verify that transmission order matters. (2) Analysis of the optimal performance possible with MIM. We show that MIM-aware transmission scheduling is NP-hard, and derive upper bounds on its performance using integer programming. CPLEX results show that the optimal gain from MIM is substantial, hence, worth investigating. (3) Design of an MIM-aware transmission scheduling system, Shuffle, for enterprise wireless LANs. Links within the same collision domain are suitably reordered to enable concurrency. A measurement-based protocol engine coordinates the overall operation, and copes with failures. (4) Implementation and deployment within our university building. Testbed results demonstrate practicality and consistent performance improvements over 802.11 and order-unaware TDMA. Additional QualNet simulations show the scalability of Shuffle system to larger networks.

2.2 Verifying MIM

We validate the existence of MIM capabilities in commodity hardware using a testbed of Soekris [174] embedded PCs, equipped with Atheros 5213 chipsets running the MADWiFi driver [5]. The experiment consists of two transmitters with a single receiver placed at various points in-between (Figure 2.2). This subjects the receiver to varying SINRs. To ensure continuous transmissions from the transmitters, we modify the MADWiFi driver to disable carrier sensing, backoff, and the inter-frame spacings. To time-stamp transmissions, a collocated monitor is placed at each transmitter. Each monitor is expected to receive all packets from its collocated transmitter, while

18 Tx Rx Intf Tx Intf co-loc 1 2 3 4 5 co-loc

Delivery Ratio for by Order per Receiver Position 1 SoI−First (SF) SoI−Last (SL) 0.8

0.6

0.4

0.2 Delivery Ratio from TxRatio Delivery

0 1 2 3 4 5 Receiver Position (from nearest to SoI transmitter towards interferer)

Figure 2.2: Testbed confirms MIM capability. Rx receives from Tx (at 5 positions) in the presence of interference (Intf). the in-between receiver is expected to experience some collisions. Merging time- stamped traces from the two monitors and the receiver, we were able to determine the relationship between transmission order and collision. Figure 2.2 shows delivery ratios for different order of packet arrivals, at different positions of the receiver. For all these positions, the interference was strong, i.e., in the absence of the SoI, we verified that the interfering packets were received with high delivery ratio. Under these scenarios, observe that when the receiver is very close to the transmitter (positions 1, 2, and 3), it achieves a high delivery ratio independent of the order of reception. This is a result of achieving a large enough SINR such that both SoI-first (SF) and SoI-last (SL) cases are successful. However, when the receiver moves away from the transmitter (positions 4 and 5), the SINR is only sufficient for the SF case, but not the SL case. Hence, at position 4, only 4% of the late-arriving packets get received, as opposed to 68% of the early-arriving packets. This validates

19 the existence of MIM capability on commercial hardware, and additionally confirms that enforcing the correct order among nearby transmissions can be beneficial.

2.3 MIM: Optimality Analysis

A natural question to ask is how much throughput gain is available from MIM? Characterizing the optimal gain will not only guide our expectations, but is also likely to offer insights into MIM-aware protocol design. Towards this end, we first prove that MIM-aware scheduling is NP-hard, and use integer programming methods to characterize the performance bounds for a large number of topologies. We compare the results with an MIM-incapable model.

Theorem 1. Optimal MIM Scheduling is NP-hard.

Proof. Consider the problem of Optimal Link Scheduling with MIM-capable nodes. An optimal schedule consists of a link selection and a corresponding MIM-aware or- dering, that together maximizes the network throughput. Assume that a polynomial time algorithm exists to provide the optimal MIM link scheduling from known net- work interference relationships. Conventional (no-MIM) link scheduling is a known NP-complete problem, reduced from Maximum Independent Set [153]. Therefore, if our assumption is true, then it would be possible to find the optimal MIM-incapable link schedule in polynomial time, just by restricting the SoI-Last SINR threshold to infinity in our algorithm (i.e., ensuring later-arriving signals are never decoded). This contradiction proves that optimal MIM-aware link scheduling is NP-hard.

2.3.1 Optimal Schedule with Integer Program

To quantify the performance gains from MIM, we model wireless networks with MIM-capable and MIM-incapable receivers, and compare their optimal throughput over a variety of topologies. The networks consist of multiple access points (APs),

20 each associated to a number of clients. Each transmission produces an interference footprint derived from a path loss index of 4. With MIM-capable receivers, the SoI-First (SF) SINR requirement is 4 dB, while the SoI-Last (SL) requirement is 10 dB [101]. With MIM-incapable receivers, the SINR requirement for reception is uniformly 4 dB, and later-arriving packets cannot be received. We construct linear (binary integer) programs to compute the maximum number of concurrent links meeting the required SINR thresholds. The linear program also produces the order. Fairness is not considered in this analysis. To make our model solvable within reasonable execution time, we make the following simplifying assumptions. (1) All clients are associated to the AP with the strongest signal strength. (2) A frame is always pending on any AP-to-client link. (3) Only a single data rate r is used throughout the network. In section 2.4.2 we consider MIM scheduling with rate control. Let a and b be arbitrary nodes in a wireless network and N be the set of all nodes. We define the boolean relation rangepa Ñ bq “ true ðñ b is within the transmission range of a. Let L denote the set of wireless links lab such that rangepa Ñ bq“ true.

Let Slba denote the SoI strength received on link l (by node a from an active transmission by node b), measured in units of power. Equivalently, let Ipmcd Ñ labq denote interference received on link l (by node b) due to a concurrent transmission on link m (from node c). Table 2.1 summarizes the parameters and variables. Under an assumption of additive multiple interference and non-fading channels, the maximum link concurrency of a given wireless network under MIM-aware MAC can be found using the integer programming formulation presented below.

Maximize: xl @ÿlPL

21 Table 2.1: Integer programming parameters and variables

Parameter Meaning N Set of all nodes. L Set of all links lab s.t. rangepa Ñ bq Sl Signal strength on link l. Ipm Ñ lq Interference on link l from link m. τ SF Sender First capture threshold. τ SL Sender Last capture threshold. Variable Value Meaning 1 Link l (a Ñ b) in use. x lab 0 Otherwise. 1 Link l starts before link m. y lm 0 Otherwise.

Subject To: @a P N

xlab ` xlba ď 1 (2.1) @bPNÿ|labPL @bPNÿ|lbaPL

@l,m P L|l ‰ m

xl ` xm ´ 1 ď ylm ` yml ď minpxl,xmq (2.2)

@l,m,n P L|l ‰ m ^ l ‰ n ^ m ‰ n

xl ` xm ` xn ´ 2 ď ylm ` ymn ` ynl ď 2 (2.3)

@l P L

Sl yml ¨ Ipm Ñ lqď SLC (2.4) 10pτ {10q @mPÿL|m‰l

Sl pyml ` ylmq ¨ Ipm Ñ lqď SF (2.5) 10pτ {10q @mPÿL|m‰l

The aggregate network throughput may then be computed as r ¨ @lPL xl. ř 22 Theorem 2. Any 0{1 solution to the above integer program satisfies the following:

1. xl “ 1 encode the active links.

2. The ylm “ 1 variables encode a total ordering on the active links where m is made active after l. (Constraints (2) and (3)).

3. The set of active links along with their ordering satisfies the interference con- straints, and is hence feasible. (Constraints (4) and (5)).

The optimal solution to the integer program is therefore precisely the optimal solution of interest.

Proof. Consider constraints (2) and (3). Suppose first that all the xl “ 1. Then,

constraints (2) and (3) are equivalent to: ylm ` yml “ 1, and 1 ď ylm ` ymn ` ynl ď 2.

We interpret the variable ylm as follows: ylm “ 1 if m follows l in the ordering, and 0 otherwise. Note that the constraints exactly encode the following information: In any ordering, for every l,m, either l appears after m or the other way around; and for every l,m,n, it cannot happen that l follows m, m follows n, and n follows l. It is shown in literature that these constraints are necessary and sufficient to encode a complete ordering.

Now suppose the xl are not all 1. In that case, ylm “ 1 only if m follows l in the ordering and both l and m are active, so that xl “ xm “ 1. In this case, the constraints (2) and (3) are meaningful only if all the corresponding x variables are all 1, which means all the corresponding links are active. For these links, the constraints (2) and (3) encode a total ordering. The constraints (4) and (5) encode the interference constraints. For any link l,

the only yml that contribute to constraint (4) are those with yml “ 1, which are precisely the m that are active and precede l. Further, the left-hand side of the

23 14 15 14 12 12 10 10 8 8

6 6

4 4 Concurrent Links Concurrent Concurrent Links Concurrent 2 w/MIM 2 w/MIM w/o MIM w/o MIM 0 0 5 10 15 20 25 30 35 5 10 15 20 25 30 35 40 45 50 Number of Clients. Sharing 25 APs Number of APs, Shared by 25 Clients

Figure 2.3: MIM can provide large concurrency gains. These graphs show the number of links that can meet SINR requirements with and without MIM enabled. Gains improve with increasing (a) number of clients and (b) number of APs. constraint is non-zero only if l itself is active. A similar reasoning shows the validity of constraint (5).

2.3.2 Results

We used CPLEX [84] to solve many instantiations of the integer program. In Figure 2.3, we present results for topologies of grid-aligned access points and randomly- placed clients. All clients associate to the the AP from which it receives the strongest signal. Each data point is sampled as the arithmetic mean of 15 trials. Results show that ideal concurrency gains can be large with MIM-capable receivers. This provides a sound motivation for designing and implementing MIM-aware protocols.

2.4 Shuffle: System Design

We propose Shuffle, an MIM-aware link layer solution that reorders transmissions to improve concurrency. Shuffle targets enterprise WLAN (EWLAN) environments, such as universities, airports, and corporate campuses [21, 12]. In EWLANs, multiple access points (APs) are connected to a central controller through a high speed wired

24 backbone (Fig. 2.1 and Fig. 2.7). The controller coordinates the operations of APs. The APs follow the controller’s instructions for transmitting packets to their clients. The rationale for targeting EWLAN architectures is two-fold. (1) EWLANs are becoming popular in single administrator environments [127, 139, 11, 12, 45]. Devel- oping this platform on sound physical and link layer technologies can further drive its proliferation. (2) MIM-aware scheduling is hard, and a systematic approach to solving it should perhaps start from a more tractable system. EWLAN presents a semi-centralized platform, amenable to experimentation. Exploiting MIM capability on this architecture is itself a rich, unexplored, research area, that could lay the foundation for extending MIM-aware schemes to decentralized systems.

2.4.1 Protocol Design

We first sketch the three main operations of Shuffle. Figure 2.4 illustrates their interactions.

1. Conflict Diagnosis: Shuffle characterizes the interference relationships be- tween links. In the steady state, links that have been concurrent in the past are scheduled concurrently, while those in conflict (across both orders) are se- rialized. With link failures, the interference relationships are appropriately revised. Over time, this continuously learned knowledge base becomes an in- terference map against which future transmissions may be scheduled.

2. Packet Scheduling: From the learned interference relationships, an MIM- aware scheduler (running at the controller) computes batches of concurrent links and their relative transmission order.

3. Schedule Execution: After scheduling batches of concurrent transmissions, the controller notifies the relevant APs when these transmissions should occur.

25 Controller Network Gateway Link Interference Data Scheduler Map packet arrives

Transmission AP Order Schedule Link Queue Execution

Packet waiting notifications / Link failure feedback

Figure 2.4: Flow of operations in the Shuffle system. Data packets arrive from the network gateway and are enqueued at an AP. The AP notifies the controller of the waiting outbound packet. The controller inserts the corresponding AP-client pair into a network-wide link queue, and eventually schedules this link as part of a concurrent batch. The AP dequeues and transmits the packet according to the controller’s prescribed schedule, and subsequently notifies the controller of all failures. The controller utilizes this feedback for loss recovery and conflict diagnosis.

The APs maintain precise time synchronization with the controller so that expected transmission orderings may be accurately executed.

(1) Conflict Diagnosis

Scheduling algorithms require the knowledge of link conflicts. MIM increases the difficulty of inferring conflicts because it introduces a dependency on transmission order. Shuffle overcomes this difficulty by admitting some inaccuracy in the inter- ference map. The main idea is to speculate that some permutations of links are concurrent, maintain their delivery ratios over time, and use these delivery ratios to infer conflict relationships. The learnt relationships can be used to speculate bet- ter, and schedule future transmissions. In the steady state, learning aids scheduling which in turn aids learning, thereby sustaining a reasonably updated interference map. Of course, packet losses may happen when the interference map becomes in-

26 consistent with the time-varying network conditions. Shuffle copes with it through retransmissions.

Speculating and Verifying Concurrency

While bootstrapping, the central controller assumes (optimistically) that any set of links formed by distinct APs may be scheduled concurrently. Upon link failure, de- tected by per-client acknowledgments, APs request the controller to reschedule the lost packet. The controller revisits the unsuccessful schedule to determine all the active APs in that schedule; it reduces the delivery ratio of the failed link against each of these APs. When no rescheduling request is received, the controller assumes successful delivery, and appropriately updates the delivery ratio for each link, against

all interfering APs. For example, when link li, belonging to a schedule S, is success- ful, the controller gains confidence that li is truly concurrent with all other APs in S.

More precisely, it gains confidence that link li can sustain earlier-arriving interfer- ences from APs that started before li, and later-arriving interference from APs that started after li. The delivery ratios for each link-AP pair are maintained for both orders, and are updated appropriately. Figure 2.5 shows the data structures main- tained at the controller. When the delivery ratio between link li and an interfering

AP falls below a threshold, the controller will either enforce schedules where li starts

first, or, (depending on the severity) will pronounce li and the interfering AP to be in complete conflict. Link-AP pairs in complete conflict are scheduled in separate batches thereafter.

Reviving Concurrency

Link-AP pairs that are currently in complete conflict may become concurrent in the future. Unless such concurrency is revived, the network may degenerate to a very conservative (serialized) schedule. To this end, Shuffle uses a “forgetting” mechanism.

27 Link li AP1 AP*2 AP3 AP4 AP5 ... Before-AP X After-AP X Estimated Delivery Ratio

Figure 2.5: Per-link data structure maintained at the controller for scheduling transmissions. AP2 is the transmitter for link li.

Over time, the controller assumes that conflicting link sets and orderings may have become concurrent and artificially improves recorded delivery ratios. When delivery ratios rise above the threshold requirements, previously conflicting link sets and orderings are attempted anew.

Opportunistic Learning

To update the interference map more frequently, Shuffle takes advantage of oppor- tunistic overhearing. For instance, a client C3 that overhears a packet from AP5 at time ti, can piggyback this information in an ACK packet that it sends in the near future. The controller has a record of which other APs were transmitting at ti. As- suming AP7 was, the controller can immediately deduce that link (AP5 Ñ C3) can be concurrent with a transmission from AP7. The exact order for this concurrency can also be derived since the controller also remembers the relative transmission or- der between AP5 and AP7, from past time ti. Continuous overhearing of packets and piggybacking in ACKs can considerably increase the refresh rate of the interference map. Convergence time can reduce, facilitating better scheduling.

28 (2) Packet Scheduling

Given the interference map of the network, the MIM-aware scheduler selects an ap- propriate batch of packets from the queue and prescribes their order of transmission. To maximize throughput, it should schedule the largest batch of packets that can be delivered concurrently without starving any client. As noted earlier, optimal MIM-aware scheduling is NP-hard and graph coloring approaches are not applicable because MIM conflicts are asymmetric in nature. Thus, new algorithms are required for effective MIM-aware scheduling. In this section, we consider packet scheduling with fixed rate. In section 2.4.2, we discuss how the Shuffle controller incorporates active bit rate control into its scheduling decisions.

Feasibility of a Schedule

An MIM-aware concurrent link schedule S consisting of ordered links l1 through

ln may be considered feasible if and only if for all li P S, all AP(li) are unique; all

Client(li) are unique; li can sustain earlier-arriving interference from all APs starting

before li; and li can sustain later-arriving interferences from from APs starting after

li. Given Q, a network-wide queue of all packets waiting for download transmissions, one possible brute-force scheduling would be to generate all B Ă Q such that B is feasible. Assuming a fixed bitrate, the highest throughput schedule would be the feasible subset of Q with maximum cardinality. While this approach may be plausible with small queues, it is imperative that we develop more computationally efficient heuristics for wider applicability. To this end, we present two sub-optimal but practical (greedy) heuristics.

Greedy Algorithm

A simple greedy algorithm would be to consider the packets in the FIFO order for inclusion in the batch. Initially, the batch B is set to empty. Then, each packet is

29 Algorithm 1 Greedy. 1: Let Q be the first L packets in the queue. 2: B :“H 3: for all Packets p P Q do 4: for j “ 0 to |B| do 5: if noConflictpp,j,Bq then 6: Add p to B in position j 7: Return B

attempted in turn to check if it is feasible to add it to the batch. Since the ordering of packets in the batch may affect its feasibility, a packet may need to be tried at different positions. Once a feasible ordering is found, the packet is inserted in that position in the batch. While this greedy scheme may not achieve optimal concurrency, it protects the clients against starvation. In every round, the first packet in the queue is guaranteed to get scheduled, and hence, every conflicting packet will progress by at least one position in the queue. As a result, it is guaranteed to get transmitted within a bounded number of batches. A reasonable fairness is also achieved through this simple scheme. The worst-case time complexity of the basic greedy algorithm is Opn2q, where n is the number of packets in the queue. Pseudocode is presented in Algorithm 1.

Least-Conflict Greedy.

A conflict-oriented greedy metric may offer higher concurrency. The basic idea, with the pseudocode given in Algorithm 2, is as follows. Each of the packets in the queue can be checked to see the type of pair-wise conflict it has with all other packets in the queue (line 3). Each packet can be assigned a score based on such conflicts.

For example, while computing the conflict of packet Pi, it is compared with every

other packet Pj in the queue. If Pi and Pj are found to be concurrent, irrespective of

their temporal order, then packet Pi’s conflict score remains unchanged. However,

if Pi must begin earlier than Pj, then the conflict score for Pi is incremented by one

(line 5). If Pi can begin later, then again, Pi’s conflict score remains unchanged.

30 Algorithm 2 Least-Conflict Greedy(Q). 1: for all Packets p P Q do 2: scorerps :“´ageppq 3: for all Packets p,q P Q do 4: if sourceppq‰ sourcepqq^ destppq‰ destpqq^ p.mustStartBeforepqq then 5: scorerps :“ scorerps` 1 6: Sort Q by increasing score 7: Return GreedypQq

The controller computes the conflict score for each packet, and sorts the packets in increasing order of this score (line 6). Then, the controller performs the basic greedy algorithm on this order of packets (line 7). The intuition is that packets with fewer conflicts will be inserted early in the batch, potentially accommodating more concurrent links. The time complexity of Least-Conflict Greedy algorithm is also Opn2q. Without the hard scheduling guarantee of first packet inclusion in every batch, clients may encounter unfairness, and even starve. To cope with this, an aging factor can be introduced along with the conflict scores (line 2). Packets that experience prolonged queuing delay receive a proportional score reduction. Over time, the packet is likely to have a low score, and hence will be certainly scheduled by the controller. Overall, this method is likely to achieve better concurrency compared to the simple greedy scheme at the expense of some unfairness.

Comparison with Optimal.

Figure 2.6 compares the concurrency of proposed heuristics to that of the optimal with and without MIM (derived from the integer program). Least-Conflict Greedy scheduling achieves near-optimal concurrency and provides consistent improvement over the naive heuristic.

31 14 15 14 12 12 10 10 8 8 6 6

4 Optimal w/MIM Optimal w/MIM Concurrent Links Concurrent Concurrent Links Concurrent 4 Optimal w/o MIM Optimal w/o MIM Greedy 2 Greedy 2 Least−Conflict Greedy Least−Conflict Greedy 0 0 5 10 15 20 25 30 35 5 10 15 20 25 30 35 40 45 50 Number of Clients in Network, Sharing 25 APs Number of APs in Network, Shared by 25 Clients Figure 2.6: Heuristics for MIM-aware scheduling.

(3) Schedule Execution

The controller repeatedly runs the scheduling heuristic on the queue of wireless- bound packets, and selects batches of ordered packets. Packets are actually queued at the respective APs, while the controller is only aware of ăAP, packet-destinationą link identifier tuples. For each link in the batch, the controller notifies the correspond- ing AP of the precise duration of stagger. By maintaining tight time synchronization with the controller (discussed in Section 2.5.2), APs are able to execute the staggered transmission schedule, illustrated in Figure 2.7. In this example, transmissions are staggered in the order AP1ÑC13 before AP3ÑC32 before AP2ÑC21. Backoff du- rations and RTS/CTS handshakes are not necessary because the scheduler accounts for link conflicts based on the interference map. Of course, transmission losses will still occur due to a variety of unpredictable reasons. Loss detection and recovery are certainly necessary.

Loss Detection and Recovery

Shuffle requires client acknowledgments for loss recovery and delivery ratio estima- tion. Shuffle schedules periodic upload time windows (UTW), reserved for ACKs and other client upload traffic. At the expiration of a UTW, the AP can deduce recep-

32 L22 P11 P22 P32 L Controller 13 P13 0P21 2 P32 1 Batch L 12 Link Queue AP1 AP2 AP3 C11

C31 C21 C12 C13 C22 C32

Figure 2.7: Illustration of a scheduled batch of packets with the staggered trans- mission times. AP1 starts first, followed by AP3, then AP2. tion failures for the packets transmitted in the preceding download period. For each failed reception, APs place the packet on a high-priority retransmission queue, and send a negative acknowledgment (NACK) to notify the controller of the loss. The controller adjusts the corresponding delivery ratios and schedules a retransmission. The AP retransmits the failed packet prior to any new packet to the same client, reducing out-of-order packet delivery.

2.4.2 Design Details

Practical considerations arise while translating the Shuffle protocol into a functional system.

2.4.3 Rate Control

In the preceding discussion on packet scheduling, we assume for simplicity that all network links operate at a fixed transmission bitrate. A link’s tolerance to an in- terference source is dependent on its operating data rate and channel quality. A practical approach must jointly consider link concurrency decisions with rate con- trol. In view of implementation complexity, Shuffle adopts a simple strategy based on recent delivery ratios of a link at different data rates. The approach is adapted from the popular SampleRate protocol [32], as follows.

33 Shuffle maintains independent rate control state for each link-interferer pair. With knowledge of delivery ratios at each link, the controller runs a rate control algorithm RateControlpl, iq to select the best rate for link l in presence of interfering AP i. Observe that this rate is the best known rate at which link l and AP i have been successful in the recent past. Not all rates may have been attempted recently, hence, this is only a heuristic. In the presence of more APs in the concurrent batch, the rate assigned to the link is conservatively chosen as the minimum among the best known rates for each AP. Once a batch of links have been formed, the controller sorts the rates in increasing order. Lower rates imply weaker links, suggesting that it is beneficial to start them earlier. Shuffle staggers each of the links to match the sorted order of the rates. Where two links share the same bitrate for their transmissions, they are staggered in order of increasing delivery ratio (offering the weaker link a greater chance of success). As time progresses, the controller gradually increases the rates of links that attain high delivery ratios. When delivery ratios go down, the rates are reduced [32].

2.4.4 Upload Traffic

The controller must account for client-to-AP transmissions in its schedules. For loss detection and upload traffic, the controller frequently reserves a network-wide upload time window (UTW). During each UTW, clients contend for channel access using the traditional CSMA (for simplicity). APs notify their clients about the periodicity and duration of UTWs through beacons. Thus, the controller may dynamically adapt the UTW schedule once per beacon interval, in response to changes in the relative bidirectional traffic load. Given that UTWs may be scheduled frequently, each in the order of a few packets, division of time into upload and download windows is not expected to substantially impact latency. Since Shuffle achieves time synchronization on the order of 20 s (Section 2.5.2), such division is practical.

34 2.4.5 Controller Placement

A number of placement options exist for installing the controller into the network. It may be collocated with the network gateway, allowing it to create MIM-aware schedules as the packets pass through the gateway. In reality, proprietary router software and administrative restrictions may impose practical constraints on collo- cation. To circumvent this, we propose to plug the controller directly onto the wired network (as an independent device or perhaps as a module in one of the APs). Be- cause of our lightweight scheduling heuristics, we find relatively low CPU utilization for the controller process («20%). Decoupling the controller from the gateway may provide higher flexibility and easier maintenance. If necessary, the controller can be reprogrammed with a better scheduling protocol, and plugged back into the network. Advantages in time synchronization and retransmissions also arise as a by-product. However, the APs may need to be “thicker” than when the controller regulates the flow of packets.

2.5 Shuffle: Implementation

2.5.1 Testbed Platform

We evaluated our fully-functional Shuffle implementation on a testbed consisting of laptops, running Linux kernel version 2.6.27 and equipped with Atheros chipset D-Link DWA-643 ExpressCard interfaces, Soekris embedded PCs running Metrix Pyramid Linux with Atheros 5213 chipset Mini PCI interfaces, and a high perfor- mance Lenovo server. One of the laptops served as the controller while others as APs and clients. Soekris devices were used as additional clients. The server functioned as a high-volume data source, representing the network gateway. Shuffle’s functional logic (including conflict diagnosis, MIM-aware transmission scheduling, and loss recovery) are implemented through element extensions to the

35 Click Modular Router [136]. Our tests assume the wireless link to be the bottleneck for all flows. Thus, in the steady state, our gateway module injects CBR UDP traffic (in 1500 byte datagrams) to each AP at a rate just exceeding the maximum theoretical wireless bandwidth. This ensures that APs are always backlogged with wireless-bound traffic. To implement Shuffle and TDMA schedule execution, we customized the Mad- WiFi 802.11 (madwifi-hal-testing, revision 3879) driver. By modifying the MadWiFi txcont configuration command, a driver ioctl call, we can selectively disable hard- ware carrier-sense, virtual carrier sense, backoff, and DIFS/EIFS/SIFS intervals on the wireless interfaces. This allows Shuffle to schedule its own stream of packets without 802.11-specific timing constraints. To allow precise transmission timing, we provide a mechanism inside the MadWiFi driver that transmits a packet on the ba- sis of the 802.11 Timing Synchronization Function (TSF) clock. For synchronization between APs and controllers, we modified the Sky2 (v1.22) Ethernet driver to in- clude the 802.11 TSF timestamp in Ethernet packets. With extensive optimization, we have been able to achieve synchronization in the order of 20 s. We report the relevant details next.

2.5.2 Time Synchronization and Stagger

To enforce transmission staggers on the order of preamble durations (tens of s), we need equally precise time synchronization between AP and controller. The 802.11 TSF clock is used for synchronizing all stations in a BSS. To synchronize APs with the Shuffle controller, we insert 802.11 TSF time-stamps into Ethernet packets by modifying the Sky2 Ethernet driver. These time-stamped control packets are ex- changed bidirectionally between the controller and APs. When a controller receives a TSF timestamped packet from the AP, it computes the offset between the times- tamp and its local TSF clock. This offset includes wire propagation delays, Ethernet

36 switching latencies, processing time, and the clock difference between the controller and the AP. The same offset is also computed at the AP, and exchanged between the two parties. The AP averages the two offsets and deduces an estimate of the actual instantaneous difference between the controller’s clock and its own TSF. Propaga- tion delays and processing latencies in the Ethernet driver are reasonably symmetric, hence, the clock difference estimation can be accurate on the order of microseconds. The clock difference is cached and exposed to Click through a sysctl interface. Fig. 2.8 presents the empirical CDF of the AP/controller synchronization error achieved by our implementation. We estimate this error by spatially collocating the AP and the controller, which then get synchronized by the same TSF clock on their wireless interfaces. The TSF clock now acts as the reference clock, allowing us to quantify our synchronization precision over the AP/controller wired connection. We consistently achieved a median synchronization error of 20 s. Since the Atheros chipset TSF implementation is accurate to ˘ 5 s (verified in a separate experiment by comparing the packet reception times at multiple TSF-synchronized receivers from a single transmitter), we believe that our total margin of error is within 25 s. Upon receiving a packet from the controller, APs busy-wait on their TSF clocks to transmit the packet at the scheduled instant. We measure the inconsistency between the scheduled time of transmission, ts, and the actual time it was transmitted, ta. This measurement was performed by assigning an AP to transmit packets with a precise spacing of 20 ms. At a collocated receiver, we measure error as 20 ms minus the observed inter-packet arrival times. The dashed line in Figure 2.8 plots the CDF of this deviation. The mean timing error is around 4 s.

2.5.3 Coordination and Dispatching

For every wireless-bound packet, the AP places the packet on a per-client queue and sends a notification to the controller. Once MIM-scheduling selects the packets to

37 CDF Schedule Execution Error 1

0.8

0.6

0.4 Empirical CDF Empirical

Synchronization Error 0.2 Timing Error

0 0 20 40 60 80 100 Scheduling Error μs

Figure 2.8: AP-to-controller clock synchronization error and transmission de- viance from the assigned schedule, relative to the local clock. AP and controller were separated by approximately 20 m of CAT-5 cable, 1 switch, and 1 hub. Margin of error ď 5 s, attributable to 802.11 TSF inaccuracy. schedule, the controller broadcasts the schedule in the form of ăAP, client, start timeą tuples. Observe that the controller does not specify which exact packets must be transmitted – it only specifies the links that must be activated. Upon receiving a schedule, the AP dequeues a packet to the specified client and passes it to MadWiFi, along with the exact local TSF clock at which transmission must start. The MadWiFi driver busy waits on the TSF clock, and hence, can transmit the packet at the precise time. Transmissions continue until the controller schedules an upload window, at which point the clients respond with batch ACKs. The batch ACK contains a bit vector that marks the failed transmissions in the preceding upload window. The AP places the failed packets on a highest-priority retransmission queue, and informs the controller. The highest-priority queue ensures that the AP will not transmit any new packet to the same client before all retransmissions have been satisfied reducing out-of-order delivery. In the subsequent download window, the controller accounts for the failed packets, and generates new ordered schedules. The process continues.

38 2.6 Evaluation

Our testbed evaluation aims to demonstrate the feasibility of Shuffle on commod- ity hardware, and to characterize the performance improvements with MIM-aware scheduling. Comparison with IEEE 802.11 MAC confirms gains from centralized scheduling. To highlight the gains attributable to order-awareness, independent of centralized scheduling, our evaluation focuses on comparison with a capture-aware TDMA (running on MIM hardware, but without imposed ordering). We begin our analysis with results using a fixed bitrate, showing how correct ordering improves delivery probability on weaker links. Next, we compare Shuffle and TDMA operating with full 802.11g rate control. We summarize our main findings below.

• We begin with 4 simple 2-AP topologies. Shuffle outperforms 802.11 by about 40% and TDMA by 20% (Fig. 2.9). Shuffle’s Jain Fairness Index is close to 1, while that for 802.11 and TDMA are around 0.95 and 0.93, respectively.

• Incorrect order of transmissions considerably degrades performance. In 2-AP topologies, the difference in throughput between the correct and wrong order is almost 30% (Fig. 2.9).

• The importance of order is more pronounced in 3 AP topologies (Fig. 2.10 and Fig. 2.11). More concurrency opportunities offer higher gains with Shuffle – up to 100% over 802.11, and 20% over TDMA. Fairness improves too. Results from 10-link topologies also attain around 70% gain over 802.11, and 20% over TDMA (Fig. 2.12).

• Fully-functional Shuffle including 802.11g bitrate control shows 29% gains over TDMA throughput in a favorable topology (Fig. 2.13) and yields 17% gain when different client positions are systematically tested (Fig. 2.14 and Fig. 2.15).

39 Throughput by Schedule, 2 AP Topologies 2 Shuffle TDMA Reverse 1.5 802.11

1

0.5

Normalized Throughput Normalized 0 A B C D Topology

Figure 2.9: Concurrency gains with only two links.

2.6.1 Throughput with 2 Access Points

To understand the primitives of MIM-aware scheduling, we begin with topologies of 2 APs, each associated with a single client. We characterized the interference relationships, as coordinated by the Shuffle controller, finding the proper stagger or- der for maximal concurrency. To understand the ramifications of incorrect ordering, we forced the controller to schedule transmissions in correct and incorrect orders. For fairness towards 802.11, we disabled RTS/CTS and ensured that the topologies under test did not include hidden terminals. Figure 2.9 presents the results. Evident from the graphs, MIM-aware transmission reordering consistently yields higher throughput than both 802.11 and order-unaware TDMA scheduling. When ordered correctly, strong links allow weaker links to start first, and then extract their own signal of interest from the channel (recall the notion of re-locking). In the absence of explicit ordering in TDMA, concurrent packets may naturally achieve “good” and “bad” link orderings due to clock synchronization error. For some packets, the “right” AP will transmit first, and for others, it will start too late and fail. Thus, for any pair of links, we expect a TDMA schedule to result in the correct order (and thereby gain) approximately half of the time. Our results support this intuition. Fairness, computed as Jain’s Fairness Index, also improves. We discuss this more later.

40 Throughput by Stagger Order of Links A, B, C 2

1.5

1

0.5 Normalized Throughput Normalized 0 A−B−C A−C−B B−A−C B−C−A TDMA C−A−B 802.11 Link Initiation Order (ordered by throughput)

Figure 2.10: Multiple Shuffle orders provide higher throughput than both TDMA and 802.11.

2.6.2 Throughput with 3 Access Points

The notion of ordering becomes more complex with 3 clients, each associated with a distinct AP. Figure 2.10 shows the throughput comparison. Since more concurrent links are feasible, Shuffle outperforms 802.11 and TDMA by larger margins. How- ever, of greater interest is the sensitivity of performance to the different transmission orders. The variation in throughput between different orders is evidently large, in- dicating that gains from MIM reordering may not be extracted blindly. Use of the incorrect order lowers throughput below that of a TDMA schedule. Interestingly, even suboptimal orderings provide gains over 802.11. This is an attribute of overly conservative carrier sense mechanisms in 802.11, leading to exposed terminal prob- lems [184]. Shuffle overcomes these problems through centralized scheduling and overlapping transmissions.

2.6.3 Fairness

MIM-aware scheduling does not degrade fairness among clients (recall that our scheduling algorithms account for fairness and starvation). Shuffle improves fair- ness over 802.11 and TDMA. In Fig. 2.11, we characterize these gains using Jain’s Fairness Index. The 802.11 backoff mechanism preferentially treats links which expe-

41 Fairness by Stagger Order of Links A, B, C 1

0.8

0.6

0.4

0.2 Jain’s Fairness Index Fairness Jain’s

0 A−B−C A−C−B B−A−C B−C−A TDMA C−A−B 802.11 Link Initiation Order (ordered by throughput) Figure 2.11: Shuffle scheduling improves fairness. rience fewer losses. Thus, 802.11 exacerbates the already disproportionate bandwidth allocation to stronger links. The Shuffle controller attempts to ensure that sufficient transmission opportunities are extended to all links, reducing this effect.

2.6.4 Performance on Larger Topologies

We tested Shuffle on larger topologies with 3 APs each connected to up to 4 clients. One of the large topologies with 10 links is illustrated in Figure 2.12(a). For this experiment, equal traffic is generated for each client. Based on their interference relationships, not all scheduled batches can consist of 3 concurrent links. As a re- sult, Shuffle sometimes schedules batches of 2 concurrent links (especially in view of fairness). Figure 2.12(b) compares the throughput between Shuffle, TDMA, and 802.11. Performance improvements in this, and other topologies are reasonable and consistent. Fairness among the links is also observed to be high, as illustrated in Figure 2.12(c).

2.6.5 Complete Shuffle with Rate Control

We evaluate the benefits of MIM-aware ordering with rate control enabled. Our transmission bitrate control mechanism is similar in principle to SampleRate and

42

Shuffle % Throughput Gain vs. TDMA, 2 Link Topology 1

0.8

0.6

0.4 Empirical CDF Empirical

0.2

0 0 20 40 60 80 100 Shuffle % Gain vs. TDMA, 50 Trials

Figure 2.13: Throughput for Shuffle versus TDMA using 802.11g with 6-54 Mbps rate control enabled. number of failed transmissions, and the amount of time scheduled on the channel, independently for Shuffle and TDMA. Given that Shuffle and TDMA run identical algorithms for centralized scheduling and rate control, with the one exception of im- posed ordering through stagger, we believe this to be a highly fair comparison. Since 802.11 is not compatible with centralized scheduling, with this testing methodology, results for 802.11 are not reported in this section. In Fig. 2.13, we present results from one topology consisting of two mutually- interfering links, similar to that presented in Figure 2.1. One link strong and rel- atively unaffected by the interferer. The other link is far more susceptible. With Shuffle, the weaker link successfully maintains a higher data rate than it can under TDMA. We plot a CDF of our results over 50 trials (the system starting from a ground state for each trial) to show that the Shuffle conflict interference mechanism can reliably deduce the proper ordering. The mean throughput gain from Shuffle is 29%. Although the potential for gains with MIM ordering is topology dependent, it is not highly sensitive. As depicted in Fig. 2.14, we deployed a two AP topology.

44 AP SoI

Interferer AP

Figure 2.14: A classroom environment with 54 seats. Leaving the AP and one client fixed, we tested with a client placed on the desk in front of each chair.

CDF, Classroom Throughput 1

0.8

0.6

0.4 Empirical CDF Empirical Shuffle TDMA 0.2

0 0 10 20 30 40 50 60 Aggregate Throughput (Mbps)

Figure 2.15: CDF of throughput for classroom test.

One AP was positioned to serve a classroom and another just outside, a strong interference source. We collocated a receiver with the outside AP, creating a strong link. By systematically moving the other client to each of the 54 seating positions, we create a diverse set of channel conditions. Fig. 2.15 shows these results. Mean throughput is 32 and 27 Mbps for Shuffle and TDMA, respectively (a Shuffle gain of 17%).

45 300 200 NoMIM NoMIM Shuffle Shuffle 250

150 200

150 100

100

50 50

0 TOPO1 TOPO2 TOPO3 TOPO4 Throughput Improvement over 802.11 (%) 0 Throughput Improvement over 802.11 (%) 0 10 20 30 40 50 Topologies Number of APs Figure 2.16: Performance evaluation on real and synthetic topologies

2.6.6 Simulation Results

We performed QualNet simulations to evaluate performance in larger topologies. MIM capabilities were carefully modeled into the PHY and MAC layer of the sim- ulator. The EWLAN controller was assumed to have a processing latency of 50s, and the wired backbone was assigned 1 Gbps data rate. We used 802.11a with trans- mission power 19dBm, two ray propagation model, transmission rate 12Mbps, and a PHY layer preamble duration of 20s. Figure 2.16(a) presents throughput comparisons for topologies taken from Duke university buildings with different number of APs on the same channel; each AP was associated to around 6 clients. As a special case, the second topology has APs associated to 20 clients, resembling a classroom setting. Shuffle consistently outperforms NoMIM, confirming the potential of MIM-aware reordering.

2.6.7 Impact of AP density

To understand Shuffle’s scalability in high AP density environments, we generated synthetic topologies in an area of 100x150 m2. We placed an increasing number of APs (ranging from 5 to 50) at uniformly random locations in this region. Each AP is associated with 4 clients and the controller transmits CBR traffic at 1000 pkts/sec

46 200 K=0 K=1 K=5 150

100

50

0 Throughput Improvement over 802.11 % TOPO1 TOPO2 TOPO3 TOPO4

Topologies

Figure 2.17: Throughput improvement under different channel fading conditions – Shuffle performs well under Rayleigh and Ricean fading. to each of the clients. Figure 2.16(b) presents results of this setting. It shows that Shuffle offers consistently better throughput than NoMIM regardless of the density of APs.

2.6.8 Impact of Fading

The earlier results were obtained without accounting for channel fading. But the impact of channel fading can be severe, and the Shuffle system needs to adapt to it over time. To evaluate our opportunistic rehearsal mechanisms, we simulate Ricean fading with varying K factors, and log-normal shadowing. Figure 2.17 shows the percentage throughput improvement of Shuffle over 802.11 for different values of K. For K “ 0 (Rayleigh Fading), the fading is severe and the improvements are less than at higher values of K. Still, the improvements are considerable, indicating Shuffle’s ability to cope with time-varying channels. The improvements were verified to be a consequence of opportunistic rehearsals; when opportunistic rehearsal was disabled, performance degraded.

47 2.7 Limitations and Discussion

We discuss some limitations with Shuffle implementation, and identify avenues of further work.

2.7.1 External Network Interference

We assume that all WiFi devices are associated to the same enterprise network. Put differently, no other WiFi transmission occurs that is not accounted for by the central controller. In reality, electronic devices such as microwaves may interfere in the 2.4 GHz band. Wireless devices from “neighboring” networks may interfere at the periphery of a Shuffle deployment. Shuffle’s packet loss recovery mechanism will be able to cope with sporadic interference. However, if the losses are frequent, carrier sensing may need to be selectively enabled for the peripheral APs, limiting the Shuffle controller’s ability to schedule for those clients.

2.7.2 Latency

Shuffle introduces some end-to-end delivery latency. When a packet is received at an AP, it cannot be forwarded to the intended client until the AP notifies the controller and receives a scheduled slot for transmitting the packet. Assuming no queuing at the AP or client, the added latency is only due to propagation, switching, and processing of two control messages. As a design alternative, this latency may be eliminated if the controller is collocated with the network gateway, so that schedules may be forwarded to APs in tandem with the outbound packet. While this provides no direct improvement for retransmissions of lost packets, recall that retransmissions get higher priority than new packets to the same client. This is expected to make the retransmission delay tolerable.

48 2.7.3 Client Mobility

As a client moves, interference relationships between links may change dramatically. While Shuffle’s concurrent link selection, rate control, and transmission ordering mechanism do adapt to changes in channel conditions, we have not yet characterized convergence time for continuous-mobility scenarios.

2.7.4 Transport Layer Interactions

We have not yet characterized TCP interaction behavior with Shuffle scheduling. A potential point of concern is division of time into upload and download periods, possibly impacting TCP round-trip time estimation and ACK timeouts. However, we believe that upload periods may be scheduled frequently enough (every few download packets) to limit this effect.

2.7.5 Compatibility

Shuffle is not immediately compatible with existing deployments. Clients must be protocol compliant so as to remain silent during download periods and provide ACKs during upload windows.

2.7.6 Small-scale Testbed

We tested our Shuffle implementation on topologies consisting of up to three concur- rent links. Though more experiments with larger topologies would be desirable to confirm the scalability of Shuffle, our simulation results indicate that Shuffle scales well.

49 2.8 Related Work

2.8.1 Capture and MIM

Theoretical models have been proposed to explain physical layer capture [59]. The first empirical evidence of capture was presented in [98]. The recent study in [101] quantifies SINR threshold requirements for 802.11a networks under different packet arrival timings. Capture awareness has been used for collision resolution in [185]. BER models for capture were proposed in [38].

2.8.2 Spatial Reuse

Schemes like [97, 85] make use of power control and carrier sense tuning to achieve improved spatial reuse. Prior work has considered RTS/CTS variants to schedule non-conflicting links [130]. However, most existing deployments do not use RTS and CTS [184], even with RTS/CTS do not exploit concurrency well. In CMAP [184], the authors propose a distributed scheme which makes use of partial packet decoding to determine if a concurrent transmission is possible. This distributed approach makes use of the delivery ratios of concurrent transmissions to determine whether they can be successful. CMAP can benefit from MIM-capable hardware, but is not MIM- aware. In contrast, our work explicitly orders transmissions to take advantage of MIM. In our earlier work [161] we have made a case for reordering transmissions. In this chapter, we presented the details of the integer programming formulation, rehearsal, scheduling, and transmission mechanisms. The most significant addition is the implementation and evaluation of Shuffle on a testbed.

2.8.3 Enterprise Wireless LANs and Scheduling

Enterprise wireless LAN architecture is increasingly becoming popular to improve throughput, monitoring and management. SMARTA [11] utilizes a centralized server to build a conflict graph and fine tune the AP’s transmit power and channel selection

50 mechanisms. Several scheduling mechanisms for single and multihop radio networks like [153] were proposed and in the context of EWLANs. Our controller-AP inter- action is similar to the one proposed in a recent work [138, 140]. The speculative scheduling solution in [12, 170] proposed a conflict graph based centralized scheduling mechanism to improve spatial reuse. Our conflict graphs are based on asymmetric link conflicts where conflicts change based on transmission ordering. Police [87] builds a conflict graph for both uplinks and downlinks in an EWLAN. This conflict graph can be dynamically updated and used for allocating airtime for links. Shuffle will benefit even more with such a scheme. DIRC [109] uses the conflict graph based approach in EWLANs for improving spatial reuse with directional antennas. We need to study the significance of MIM and ordering with directional antennas.

2.8.4 Characterizing and Measuring Interference

In [182] the authors analyze the effects of combined interference and suggest that an additive interference mechanism like the one used in QualNet [164] is a very close approximation. This assertion is further supported in [113]. An O(n2) algorithm for estimating link state interference in multihop wireless networks was proposed in [144] and a linear order algorithm that takes capture into account was presented in [102]. These measurement schemes aim to estimate the pairwise interference between links with few measurements. In [155] the authors show how signal strength conditions vary transiently in real networks and they quantify the affects of received signal strength on delivery probability. In this work we use a hybrid approach of measuring individual link RSSI values to prune the initial set of concurrent links and a decision making scheme based on concurrent delivery ratios similar to [184].

51 2.9 Conclusion

Message in Message (MIM) in modern wireless cards allows a receiver to disengage from an ongoing reception, and engage onto a new stronger signal. The rewards from this physical layer capability cannot be fully realized unless link layer protocols are explicitly designed with MIM-awareness. Specifically, we have shown that links which conventionally conflict with each other may be made concurrent if they are initiated in a specific order. We then presented Shuffle, a system that reorders transmissions to improve spatial reuse. Theoretical analysis has shown that the optimal improvements with MIM can be significant. A functional testbed validated that MIM-awareness is practical, while results of experimental evaluation confirm consistent performance improvements.

52 3

Monitoring the Health of Home Wireless Networks

Deploying home access points (AP) is hard. Untrained users typically purchase, in- stall, and configure a home AP with very little awareness of wireless signal coverage and complex interference conditions. We envision a future of autonomous wireless network management that uses the Internet as an enabling technology. By leveraging a P2P architecture over wired Internet connections, nearby APs can coordinate to manage their shared wireless spectrum, especially in the face of network-crippling faults. As a specific instance of this architecture, we build RxIP, a network diagnostic and recovery tool, initially targeted towards hidden terminal mitigation. Our stable, in-kernel implementation demonstrates that APs in real home settings can detect hidden interferers, and agree on a mutually beneficial channel access strategy. Con- sistent throughput and fairness gains with TCP traffic and in-home micro-mobility confirm the viability of the system. We believe that using RxIP to address other net- work deficiencies opens a rich area for further research, helping to ensure that smarter homes of the future embed smarter networks. In the near term, with the wireless and

53 entertainment industries poised for home-centric wireless gadgets, RxIP-type home management systems will become increasingly relevant.

3.1 Introduction

The Enterprise WLAN (EWLAN) network architecture has gained rapid popularity in single-administrator environments, such as universities, airports, and corporate campuses. In EWLANs, multiple wireless access points (APs) are connected to a central controller through a high-speed wired backbone. The controller assimilates a centralized view of the network, facilitating coordination that would be difficult over the wireless channel alone. The overhead of coordination is offloaded to the out-of-band wired infrastructure, freeing the wireless spectrum for productive data communication. Deployment experiences show reduced hidden/exposed terminals [44, 170], greater spatial reuse [170, 120], smarter association [139], and a host of other enhancements to the end-user experience [22, 37, 11]. These techniques have proven practical, with commercial systems available from Aruba, Cisco, and Meru [127]. Unlike EWLANs, residential wireless networks (RWLANs) do not share a com- mon, centralized infrastructure. Each residential AP is typically purchased, installed, and configured by the resident, without any type of interconnection to its neigh- bors [13]. The advantages of EWLANs are apparently unavailable. This chapter investigates the feasibility of using the Internet as a wired backbone to coordinate residential APs.By exchanging their globally-routable IP addresses through wireless beacons, APs can be made to communicate with neighboring APs over the wired In- ternet. This out-of-band communication channel can emulate some of the EWLAN advantages in residential settings, and yet, preclude the need for a central controller. While numerous possibilities emerge, our first step is to narrow our exploration of this architecture to a specific application. We develop RxIP (Prescription: IP), a network diagnostic and recovery tool, targeted at hidden terminals.

54 Motivation and Measurements

The increasing availability of fiber-to-the-home and 100 Mbps+ cable access (DOC- SIS 3.0) are transforming wireless networks [75, 18]. The bottleneck is no longer the ISP, but instead, the wireless network itself. High-bandwidth multimedia appli- cations within the home are further reducing the slack: Apple TV, HD streaming, Apple Time Capsules, SONOS music systems, etc., are demanding significantly more capacity. When neighboring apartments simultaneously run these applications, the interference floor will far exceed what we have experienced in the past. While physical layer technology may keep up with this demand, we argue that atypical, unantici- pated scenarios will be unavoidable. A network-wide health-monitoring framework, similar to those in enterprise WLANs, would be valuable to provide sustained sta- bility.

Hidden terminal impact on network stability

Hidden terminal problems have been reported to impact network stability, especially in view of TCP [145, 95], the dominant component of residential traffic [114]. The problem manifests into severe packet loss, to the extent that 802.11 link-layer re- tries do not ensure successful packet delivery. TCP experiences these losses, and assumes severe network congestion. In response, it overzealously reduces its conges- tion window, yielding intolerably-poor performance. Measurements confirm hidden terminal impact in enterprise networks [45, 170]. Typical residential settings exacer- bate the problems; APs may be located far from the wireless devices, and may often be placed on or near the floor, increasing multipath, weakening links, and lowering bitrates [145]. As bitrates drop, channel occupancy inflates, and correspondingly, the probability of packet collisions increases.

55

Impact of Client Mobility 14

12 AP1 AP2 10 C1 C2 8 C2 Position 25m 6 4 2 AP1 AP1 TCP Thpt (Mbps) 0 0 5 10 15 20 25 C2 Position (meters from C1)

Figure 3.3: As C2 moves towards its AP, it becomes less susceptible to hidden terminal interference from AP1. TCP more-fully utilizes the channel, and corre- spondingly, C1 is severely impacted by AP2.

Existing recovery mechanisms are insufficient

Enabling 802.11 RTS/CTS hidden terminal protection (for long packets) substan- tially reduces overall throughput [90]. ZigZag decoding [68] has been shown to be an effective hidden terminal mitigation system in software radio testing. However, without a practical implementation, it is not readily deployable in WLANs with commodity hardware and legacy clients. Centaur [170] successfully isolates hidden terminal traffic, but is only relevant to EWLANs with a low-latency backbone and central controller. Residential networks do not have such backbones, and suffer from the lack of coordination among chaotically placed APs [145]. Interference-aware, planned deployment is not an option, as lay users must be able to install these de- vices with plug-and-play simplicity. The onus is on network solutions to provide stable performance under strict constraints.

58 Out-of-band opportunities are available

We observe that the wired Internet already connects the majority of home APs, and the opportunity can be utilized to coordinate their operations, out-of-band. Of course, challenges naturally arise, including coping with Internet latencies, time synchronization, accurate fault diagnosis, and mechanisms for quick recovery. We systematically address these complications, moving towards a more stable RWLAN. While the efficiency of an ideal (enterprise-like) deployment may not be possi- ble in unplanned networks, coordination-enabled, automatic network refinements may bring a network with unacceptable performance to adequacy. Besides, these solutions are practical without any client-side modifications. By remaining invisi- ble to devices and users, the system can retain the necessary plug-and-play simplicity.

Our contributions are three-fold:

1. We identify the Internet as a viable control plane for coordinating wireless APs in home networks. While the core idea is not entirely novel [13], we believe that our application and implementation in the context of residential networks enables new opportunities.

2. We develop RxIP, a distributed hidden terminal diagnostic and recovery service. Internet-based coordination enables cooperative mitigation among neighbor APs under TCP traffic and in-home mobility.

3. We implement RxIP as a Click Router kernel module and experimentally char- acterize its performance with testbeds up to 12 nodes. Results show a median throughput improvement of 57% against 802.11 (with RTS/CTS turned off) in symmetric hidden terminals, while also improving fairness.

59 3.2 RxIP Architecture

This section presents a high level overview of the system, followed by an outline of the underlying components. The design details are presented thereafter. RxIP APs periodically announce their Internet IP addresses, through wireless beacons. Neighboring APs overhear these beacons and relay them one additional hop to ensure that they are received by potential hidden terminals. When APs learn about the presence of a new neighbor, they send a wireline probe to the specified IP address, establishing a control channel over the Internet. To mitigate hidden terminal problems, RxIP relies on this direct, AP-to-AP coor- dination. The main idea is that APs monitor their wireless performance and periodi- cally cross-check with nearby APs over the Internet. Bloom filters efficiently maintain the history of transmission timestamps at each AP, facilitating timing analysis for hidden terminal diagnosis. An observed correlation between two APs’ transmission times (matched over the Internet) and collision rates (observed over wireless) raises suspicion of a hidden terminal scenario. Confirming the suspicion, hidden APs estab- lish pair-wise partnerships to relieve the effects. Mitigation happens through a hy- brid TDMA/CSMA schedule, implemented via token exchanges. The token exchange mechanism is designed to scale for complex interference relationships, ensuring that hidden APs never transmit during the same timeslot. The latency in Internet-based coordination is addressed by scheduling transmissions slightly in advance. APs that are not affected by hidden terminals continue their operations unaltered. Relative to unassisted CSMA, performance improves due to reduced collisions, fewer TCP disruptions, and higher bitrates. Moreover, Internet-based coordination frees up wireless bandwidth for productive data communication. We prove that coordination is correct and efficient in Section 3.4 and present experimental results in Section 3.6.

60 Incentives

Since residential APs typically do not share a common administrative domain, they should be incentivized into protocol compliance only by service improvements for their own clients (we assume that APs may be selfish, but are non-malicious). Our distributed scheduling mechanism allows each AP to individually select precisely those peers with which it wishes to serialize its transmissions. Serialization is only required when both parties agree. By consensus-only serialization and peer monitor- ing to disincentivize cheating, we attempt to maintain incentive-compatibility for all APs. In Figure 3.4, we consider the two cases that warrant coordination. In a sym- metric hidden terminal (AP1 and AP2), each AP appreciably interferers with its peer’s client. Losses are roughly equitable; coordination provides immediate gains for both APs. In an asymmetric hidden terminal (AP3 and AP4), one AP has an advantaged position. The weaker link (AP4ÑC4) experiences excessive loss, lead- ing to disproportionally-reduced congestion windows for TCP flows. The disparity between the strong and weak link becomes severely exaggerated. In such cases, the strong link may still consider coordination as incentive compatible, if there is an expectation that there might be a role reversal in the future. This may occur over long timescales with client mobility throughout the home and environmental changes (e.g., a closed door), or in short term due to stochastic fluctuations in the wireless channel.

Symmetric HT Asymmetric HT AP1 AP2 AP3 AP4

C1 C2 C3 C4

Figure 3.4: Hidden terminal conditions.

61 Time Synchronization

In lieu of a global clock, RxIP APs maintain logical time synchronization with their coordination partners. Through periodic beacon reception and hardware timestamp- ing, we maintain microsecond-granularity precision on a pairwise basis. When an 802.11 beacon is overheard, APs subtract the beacon TSF timestamp from the local TSF time of beacon reception, determining a clock offset. This passive technique does not interfere with existing AP-to-client 802.11 TSF synchronization, allowing the AP to maintain control over existing time-sensitive operations within its own BSS. Two-hop synchronization with hidden terminals is also feasible. For each of an AP’s one-hop (directly-synchronized) peers, the AP uses the Internet to forward its synchronization offsets to all other one-hop peers. In Section 3.6.4, we evaluate synchronization precision.

3.3 Hidden Terminal Diagnosis

In this section, we divide the details of hidden terminal diagnosis into two subtasks: (1) ensuring that hidden terminals are the cause of performance degradation; and (2) isolating the particular hidden terminal at fault.

3.3.1 Ensuring Hidden Terminals are the Cause

Performance fluctuations are common in wireless networks, and may not be always attributable to hidden terminals. In light of this, we suggest checking for two condi- tions that indicate early evidence. (i) Due to channel reciprocity, most AP-to-client links should exhibit a rough symmetry in their upload and download characteristics. The symmetry may be ob- servable in the bitrates selected by 802.11 (e.g., for downlink DATA and uplink TCP ACKs), and even in the delivery ratio of packets in each direction. However, hidden terminals are likely to induce stronger asymmetry. Downlink traffic to client C1

62 (Figure 3.4a) may suffer due to hidden terminal AP2, while the uplink transmissions from C1 to AP1 may retain a high delivery ratio/bitrate. Observing asymmetry could be a sign of nearby hidden terminals. (ii) Received signal strength (RSSI) of client-transmitted packets, overheard at a neighboring AP, may be another indicator. In Figure 3.4a, AP2 may overhear C1’s ACKs with reasonably high RSSI, but may not overhear AP1’s DATA transmissions. Again, assuming rough channel symmetry, C1 is also likely to experience a strong RSSI from AP2, indicating the possibility of hidden terminals. Of course, it is important to ensure that AP1 is not within carrier-sensing range of AP2 (in which case they are not hidden). For this, AP2 can check whether it overheard AP1’s wireless beacons in the past. If AP2 discovers that it has received AP1’s IP address only through a two-hop relayed beacon (not from overhearing), then the evidence for a hidden terminal is stronger. The above two conditions may not be conclusive; each test may incur false posi- tives. Residential environments may exhibit inherent channel asymmetry [145]. APs formerly outside mutual carrier sense range (during beacon transmission) may no longer be hidden, due to channel variation. Even if the cause of performance degra- dation is indeed due to hidden terminals, an affected AP needs to accurately identify the culprit. Thus, triggered by the above symptoms, we propose a refined analysis, targeted to concretely isolate the specific hidden terminal.

3.3.2 Isolating the Hidden Terminal

A RxIP AP records timestamps for each packet it has transmitted in the recent past, allowing peers to determine when it has transmitted concurrently. A fixed- sized bloom filter can be used as an efficient data structure. When hidden terminal conditions are suspected, an AP initiates a challenge-response protocol with peer APs over the Internet. Each AP queries its peers with a suitably chosen timestamp (the

63 choosing scheme will be discussed soon). Timestamps have millisecond-granularity, effectively slotting time into approximately packet-sized intervals. The peers convert the timestamp to local time (using up-to-date logical time synchronization), consult their bloom filters, and report back whether they performed a concurrent transmis- sion at that time. APs maintain a saturating counter for each peer AP. For each received report, one of the following four cases results.

1. When an AP suffers a loss and a concurrent transmission is reported, the AP increases the counter for the peer by a large increment (collision).

2. When an AP transmits a packet successfully and a concurrent packet is re- ported, the AP decreases the counter by a large increment (no collision).

3. When the AP suffers a loss and no concurrent transmission is reported, the AP decreases the counter by a small increment (no concurrency).

4. When an AP transmits a packet successfully and no concurrent packet is re- ported, the AP decreases the counter by a small increment (no concurrency).

When a counter saturates high, the AP deems the peer to be a hidden terminal. If a counter desaturates for a peer with which no partnership is active, it is no longer considered a hidden terminal. Once a partnership is formed, counter desaturation reflects an expected alleviation of hidden terminal effects. To account for dynamic network conditions, especially those caused by client mobility, partnerships may be periodically disabled to check if the hidden terminal condition still exists.

Bloom Filter Operations

To answer challenge-response probes, our timestamp data structure needs two oper- ations, ADD (to insert a new timestamp) and CHECK (to test if a queried timestamp

64 has been inserted previously). Because of its constant-time efficiency, a bloom fil- ter is particularly well suited to this purpose. The bloom filter is maintained as a pair of bit arrays, initialized to 0. In a rotating fashion, one array is designated as CHECK/ADD and the other as CHECK-ONLY. During an ADD or CHECK, a timestamp is run through an MD5 hash, producing a 128-bit expansion. This digest is split into 8 values, simulating 8 independent hash functions. Each of the values serve as indices into the bit array. In an ADD, the corresponding bits in the CHECK/ADD array are set. In a CHECK, “yes” is returned if all 8 bits are set in either array, “no” otherwise. Once the CHECK/ADD array becomes saturated after many ADD operations, the CHECK-ONLY array is reset and is swapped with the CHECK/ADD array. In our implementation, a pair of 4096-bit (512-byte) arrays provide a false positive probability bounded by « 0.05.

Preventing Misbehavior

In scenarios as in Figure 3.4b, AP4 may have an incentive to trick AP3 into believing that AP4 is a hidden terminal for AP3’s client, C3. In reality, only AP3 is a hidden terminal to AP4’s client – we call this an asymmetric hidden terminal condition. AP3 can prevent deception by AP4 through careful selection of challenge-response probes. Importantly, this is possible even while AP3 simultaneously experiences hidden terminal losses from other APs. Specifically, AP3 should choose its probing timestamps from both successful as well as failed transmissions (in a roughly-equal mix). Unless AP4 can guess which of the probes are from failed packets, it will not know when to “lie” that it was also transmitting concurrently. Random guesses are likely to cancel out on average, leaving AP3’s saturating counter for AP4 unaffected. Thus, it is consistent with AP4’s best interest to respond truthfully to AP3’s probes, ultimately allowing AP3 to make a correct determination.

65 3.4 Recovery by Coordination

RxIP mitigates hidden terminals through Internet-based coordination. The idea is to rotate channel access rights between the hidden interferers such that no two interferers transmit concurrently. Importantly, APs may experience multiple hidden interferers, together resulting in an interference graph. This section describes how RxIP coordinates APs over this interference graph, ensuring deadlock-free operation, high channel utilization, and robustness to Internet latencies and dropped packets. Once a pair of APs diagnose a hidden terminal fault, they may respond by es- tablishing a channel token, to be passed back and forth. As in many existing token- based schemes, such as JazzyMac [142] in the wireless domain, token passes serve as a scheduling mechanism. At any time, only one AP is the token bearer. The other AP, that does not have the token, is free to transmit indefinitely. Unlike traditional token-based access control, the token bearer does not have the channel access right. Instead, it has the right to “purchase” a transmission timeslot on demand. In that sense, tokens are like money: a token bearer can buy a timeslot by giving the token to its counterpart. The counterpart now becomes the token bearer, and is able to purchase a subsequent timeslot with the same token. In this manner, interfering APs may reserve alternate timeslots in the future. Under certain circumstances, the token bearer may choose not to purchase the next timeslot. Instead, the token bearer holds on to the token and sends an abstain notification. The counterpart can then transmit during the “abstained” slot. Figure 3.5 shows a simple two-AP exchange.

3.4.1 Coping with Internet Latencies

Every round of token-exchange on the Internet reserves the channel for some wireless transmissions in the near future. When the future time comes, the owners of the reserved timeslot transmit their data packets. One issue is that token-exchanges

66 Token AP1 AP2 AP1 AP1 AP2 purchase purchase abstain purchase [t5, t6) [t6, t7) [t7, t8) [t8, t9) Wired Wireless

AP1 txmt AP2 txmt AP2 txmt AP1 txmt t5 t6 t7 t8 t9

Figure 3.5: Timeline of wired token exchange and wireless timeslots. AP1 pur- chases timeslot t5 to t6 by giving the token to AP2. AP1 may not be able to transmit at t7 (due to some other partnership, not shown). AP1 abstains from a token pass at t7, allowing AP2 to transmit. However, AP1 silences AP2 at t8 instead.

incur Internet-scale latencies, and if they are not fast enough, they may not be able to “stay ahead” of the actual wireless transmissions. To avoid this possibility, we choose our timeslot durations to slightly exceed the average token passing time (i.e., half of the RTT between APs). As we validate experimentally, longer timeslots do not impact any AP’s long-term bandwidth share or aggregate throughput. However, delivery latency is adversely affected. TDMA schemes are known to incur higher latencies under light traffic conditions (unlike CSMA, a TDMA transmitter will need to wait for its turn) [156]. This increased latency correlates to the timeslot duration. Results in the next section show latency with varied, realistic slot durations.

3.4.2 Multiple Partnerships

Token passing becomes non-trivial when APs simultaneously need to partner with multiple hidden terminals. A channel access token is associated with each partner- ship. To transmit during a particular timeslot, an AP must satisfy the following requirement for every partnering AP: it either purchases that timeslot by expending the channel token to that partner, or receives an abstain notification for that slot from the partner. This ensures that all partners will remain silent for that slot. Im-

67 purchase 1. AP2 Active [7-8) Time APs T23 7-8 1 T12 abstain 8-9 2 [7-8) AP1 9-10 3 T13 10-11 1 purchase AP3 ... [7-8)

purchase abstain 2. AP2 3. [9-10) AP2 [8-9) T12 T23

purchase T12 AP1 purchase [8-9) AP1 [9-10) T23 abstain AP3 T13 purchase T13 AP3 [8-9) [9-10)

Figure 3.6: Rotating channel access rights, established by token exchanges across multiple partnerships. portantly, this implies that an AP needs to gather all of its tokens, and spend them simultaneously to purchase the timeslot. Figure 3.6 illustrates the interactions be- tween pairwise exchanges, and how each AP fairly receives its channel access rights. The movement of tokens between partnered APs schedules cyclical non-overlapping timeslots.

3.4.3 Provable Properties of Coordination

Each RxIP partnership is an agreed contract between a pair of APs. We may express these terms as axioms, and use them to prove desirable properties about our system.

1. APs that receive the token from a token bearer may not transmit.

2. Token bearers that keep a token may not transmit.

3. An AP may transmit precisely when it receives no tokens and spends all held tokens.

68 4. The token bearer in a new partnership must be the bearer in all of its partner- ships.

Based on these axioms, we have proven the following properties.

1. Protocol operation is deadlock-free.

2. An AP waits no more timeslots between transmissions than the number of coordination partnerships in which it is engaged.

3. A partnership between a pair of APs only induces silence if one is actually allowed to transmit.

4. Token passing implements optimal graph coloring for connected bipartite part- nership graphs.

Theorem 3. Protocol operation is deadlock-free.

Proof. Let GpV,Eq denote the directed coordination partnership graph where V is the set of APs and E is the set of coordination partnerships. Let e P E denote a directed edge from the token bearer to the non-bearer in a partnership. By the en- forced partnership establishment procedure, the requested AP in a new partnership, v P V , must have only outgoing edges. v cannot be in a cycle, thus the new partner- ship could not have created a cycle. Similarly, after a token pass, v has only incoming edges. Thus, v cannot be in a cycle, and so the corresponding partnerships cannot create a cycle. Neither partnership establishment nor token passing can create cycles in G, thus G is constructed and maintained acyclic. An acyclic graph must contain some vertex v with only outgoing edges. In G, this corresponds to an AP that is the token bearer in all partnerships. This AP may pass its tokens and transmit. Since the graph remains acyclic across token passes, some other AP must now have all tokens.

69 Theorem 4. An AP waits no more timeslots between transmissions than the number of coordination partnerships in which it is engaged.

Proof. An AP cannot be silenced except through receipt of an additional token. Tokens are not lost between transmissions. Thus, each partner can contribute at most one additional silenced timeslot. Therefore, an AP’s inter-transmission latency is bounded to the same number of timeslots as coordination partnerships.

Theorem 5. A partnership between a pair of APs only induces silence if one is actually allowed to transmit.

Proof. An AP A is silenced when its receives a token from parter B. That parter passes the token only if allowed to transmit in all other partnerships. Thus, a part- nership between A and B only induces silence if one of the two nodes is actually allowed to transmit.

Theorem 6. Token passing implements optimal coloring for connected bipartite graphs.

Proof. Proof follows by induction. A base case of 1 node is trivially satisfied. Assume a bipartite graph of k nodes implements an optimal coloring. The graph can be partitioned into L Ď G and R Ď G such that L X R “ H, all nodes in L transmit simultaneously, all nodes in R transmit simultaneously, and @l P L,r P R, l and r are never concurrent. Partner an additional k ` 1 node, v. W.l.o.g, add v P L. All partners of v fall in R, thus v may transmit concurrently with all nodes in L.

3.5 Additional Considerations

3.5.1 Coping with Token Loss

Tokens can be lost due to a number of pitfalls: APs may fail or become disconnected; packets may be lost or incur arbitrary delays and reordering; and non-compliant be-

70 havior can cause deadlock. APs continually monitor their partnerships for deadlock scenarios. All partnerships that could be at fault are temporarily severed and formed anew using the correct establishment procedure. Meanwhile, regular CSMA provides a natural fallback.

3.5.2 Address Translation

Network address translation (NAT) may apparently impose some difficulty in part- nership establishment, since each AP must effectively act as an Internet-accessible server. In residential deployments, however, the AP itself typically serves as a NAT device and has a globally-routable IP on its gateway interface. In rare scenarios with an independent NAT device or multiple APs per home, UPnP (Universal Plug-and- Play) allows automatic configuration for NAT port forwarding.

3.5.3 Upload Traffic

In establishing TDMA schedules, we have not provided explicit scheduling for up- load traffic. While this could be achieved with our architecture, complete scheduling would mandate client modification. Moreover, for download TCP traffic, there is greater benefit to protecting TCP data (received at the client) than ACKs (at the AP). TCP cumulative ACKS are highly redundant, as each ACK packet acknowl- edges every preceding received byte since the start of the session. TCP is only affected when multiple, consecutive ACKs are lost. Thus, hidden terminals among APs are more damaging than among clients for download flows. Given the predom- inance of download traffic in home networks (85% of residential [114]), the potential gains from upload scheduling seem less compelling.

3.5.4 Incremental Deployability

In RWLANs, nearly all APs represent independent administrative domains. Thus, a practical system must be incrementally deployable. Our solution requires no changes

71 to 802.11 clients. CSMA contention mechanisms still operate normally. Simply, no partnership are established with non-compliant APs. At worst, performance degrades to traditional 802.11.

3.6 Evaluation

We take a systems-oriented approach in evaluating RxIP. Our prototype implemen- tation provides the full functionality of our scheme, including (1) automated AP peer discovery; (2) precise two-hop time synchronization; (3) hidden terminal inference using link asymmetry, peer feedback, and bloom filter-based transmission timing analysis; and (4) maintenance of hybrid TDMA/CSMA schedules using token pass- ing. Our evaluation consists of three main analyses:

1. We characterize the ability of our system to automatically detect, isolate, and recover from hidden terminal scenarios.

2. We use a series of microbenchmarks to quantify important performance at- tributes of our design and implementation, including time synchronization and an ability to cope with Internet latencies.

3. We subject our system to larger (6-AP) topologies with an inflated number of hidden terminals. Performance gains over 802.11 reflect the robustness of our coordination-based TDMA and an ability to adapt to adverse network conditions.

3.6.1 Testbed Platform

We evaluated our system on a testbed of laptops, serving as APs and clients. Laptops were configured with Linux kernel 2.6.24.7, Core 2 Duo CPUs, and Atheros chipset D-Link DWA-643 ExpressCard WLAN interfaces. For some UDP experi- ments, Soekris embedded PCs, configured with Metrix Pyramid Linux and Atheros

72 5213 chipset MiniPCI interfaces, served as supplementary clients. We implemented our system through in-kernel element extensions to the Click Modular Router. For precise TDMA schedule execution, we modified the MadWiFi 802.11 driver to pro- vide Click interfaces to (1) access the TSF clock; (2) block the transmission queue and buffer waiting packets; and (3) transmit buffered packets and re-enable the trans- mission queue. We use 802.11b/g as there is not yet reliable 802.11n Linux driver support for our hardware. To consider the effectiveness of our approach under real- istic bitrate conditions, all nodes use the popular SampleRate [32] loss-based bitrate selection heuristic.

3.6.2 Methodology

Our tests assume the wireless link to be the bottleneck. We compare our system against standard 802.11 DCF using Iperf, a widely-distributed network benchmark- ing tool. Only AP-to-client, download, traffic is considered. However, TCP results reflect the interaction of bidirectional traffic. Throughput, fairness, and jitter results are as directly measured by Iperf. Virtual carrier sense (RTS/CTS) is disabled for all tests.

3.6.3 Hidden Terminal Diagnosis and Recovery

We test system effectiveness in (i) symmetric hidden terminal conditions, (ii) asym- metric hidden terminal conditions, (iii) in varied interferer positions, and (iv) across client mobility for the interferer. RxIP provides stable performance across adverse hidden terminal conditions.

Symmetric Hidden Terminals

We show that RxIP substantially improves performance for both links in symmet- ric hidden terminals. For these tests, two APs are placed outside of mutual carrier sense range, creating the hidden terminal. Each AP has a single associated client,

73 Throughput Gain in Sym HTs Per−link Thpt in Symmetric HTs 1 1

0.8 0.8

0.6 0.6

0.4 0.4 802.11 Empirical CDF Empirical Empirical CDF Empirical 0.2 0.2 Coordination Ideal share 0 0 −50 0 50 100 150 200 250 300 0 2 4 6 8 10 12 Per−link Throughput % Gain Throughput (Mbps) Fairness in Symmetric HTs 1 802.11 0.8 Coordination Ideal share 0.6

0.4

Empirical CDF Empirical 0.2

0 0 0.2 0.4 0.6 0.8 1 Jain’s Fairness Index

Figure 3.7: (a) With TCP, RxIP provides a median 57% gain over 802.11 under symmetric hidden terminals. (b) RxIP extracts the majority of available gain. (c) Despite the already-symmetric conditions, RxIP further improves fairness. placed symmetrically in between, providing similar AP-to-client and interferer-to- client channel qualities for each link. A third AP serves as a relay for time synchro- nization. APs rely on automated hidden terminal inference mechanisms to request and accept partnerships. While the resulting topology exhibits typically-symmetric performance characteristics, channel fluctuations exacerbated by TCP congestion window throttling, occasionally break symmetry. When one link suffers a period of disproportionate loss, it cuts its TCP congestion window by an excessive margin. The other link, benefiting from the now-clearer channel, experiences a loss reduction and correspondingly increases its window.

74 Figure 3.7 (a,b) presents our results with TCP download traffic. In these symmet- ric conditions, we find a mean 53% (median 57%) throughput gain from coordination, with 91% of links experiencing an improvement. Despite the already-symmetric topo- logical construction, fairness improves by a mean 8%. This is expected, as hidden terminals render 802.11 backoff ineffective.

Asymmetric Hidden Terminals

In an asymmetric hidden terminal, when the strong link agrees, coordination can provide both links stable performance (Figure 3.8). In asymmetric conditions, one AP’s link suffers such severe losses that TCP fails to saturate the link, receiving only negligible throughput. The other AP gains a clear channel. Given this extreme condition, the advantaged AP may still be willing to enter a partnership if there is an expectation of future role reversal (e.g., from client mobility, discussed later). Coordination in asymmetric hidden terminals may be expected to decrease aggregate network throughput, at the gain of far-greater fairness and longer-term stability. Bandwidth formerly monopolized by a high-rate link is partially reallocated to the weaker link. However, this effect is lessened in practice, as coordination may reduce losses on both links. We test asymmetric hidden terminals as in the symmetric case, except that APs are configured to participate in partnerships if there is a gain for either peer. We conduct these tests in an apartment complex. In one apartment, we position an AP at the cable point-of-presence in a study and a client in the common room (the weak link, Figure 3.1). We place a second AP and client in an adjacent apartment space (a large shared lobby area), serving as a strong link. In Figure 3.8, we see how coordination redistributes channel access to closely match an ideal 50-50 share. Compared to the symmetric case, we see greater efficiency as partnerships are entered freely, reducing the number of losses during fault detection.

75 Per−link Thpt in Asymmetric HTs Fairness in Asymmetric HTs 1 1 802.11 0.8 0.8 Coordination Ideal share 0.6 0.6

0.4 0.4 802.11 Empirical CDF Empirical 0.2 CDF Empirical 0.2 Coordination Ideal share 0 0 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 Throughput (Mbps) Jain’s Fairness Index

Figure 3.8: TCP throughput and fairness under asymmetric hidden terminals. (a) Coordination balances the asymmetry, closely approximating an ideal 50-50 channel share. (b) Fairness improves dramatically.

Interfering AP Location

RxIP coordination prevents hidden terminal losses, irrespective of the interfering AP’s location. Figure 3.9 shows stable performance, irrespective of interferer loca- tion, as opposed to extreme highs and lows with 802.11-based wireless coordination alone. Impact of AP Position 20 802.11 Coordination 15

10

5 AP1 AP1 TCP Thpt (Mbps) 0 0 5 10 15 20 25 30 35 40 AP2 Position (meters from C1)

Figure 3.9: RxIP protects the AP1-C1 link from performance degradation regard- less of AP2 position.

76 Impact of Client Mobility 14 802.11 12 Coordination 10 8 6 4 2 AP1 AP1 TCP Thpt (Mbps) 0 0 5 10 15 20 25 C2 Position (meters from C1)

Figure 3.10: As C2 moves from position 0 to 20m, its link strengthens, becoming less susceptible to hidden terminal interference from AP1. TCP more-fully utilizes the channel, and correspondingly, C1 is severely impacted by AP2. Coordination protects both links.

Client Mobility

RxIP alleviates TCP-imposed instability, caused by the neighbor’s client mobility. Figure 3.10 shows client C2’s movement dramatically affecting throughput for the other hidden terminal link (AP1ÑC1). This may apparently seem counter-intuitive, but as losses impact TCP, channel occupancy inflates, and other links are corre- spondingly affected. With RxIP, coordination protects both links, ensuring stable performance at all client locations. In C2 positions 0-6m (X-axis), AP1 sacrifices some channel access time to AP2 (an asymmetric hidden terminal with AP1 as the stronger link). In exchange, AP1 is protected when the AP2 link strengthens (i.e., C2 moves to positions 6-20m).

3.6.4 Microbenchmarks

Internet Latency

RxIP is compatible with Internet-scale latencies, shown through emulation of realis- tic RWLAN conditions. By artificially delaying all coordination traffic, we match

77 Inter−apartment RTT RTT versus Delay 50 25

40 20

30 15

20 10 RTT (ms) Observed 10 5 Apartment RTT 0 (ms) Delay AP−to−Client 0 0 100 200 0 12.5 25 37.5 50 Probe Index AP−to−AP RTT (ms)

Figure 3.11: (a) RTT between APs across an apartment complex using 1.5Mbps cable. (b) AP-to-client delivery latency exhibits a linear relationship to the Internet RTT between partnered APs (2x AP-to-AP delay). the link characteristics of 768 Kbps upload broadband connections for each AP with varied AP-to-AP RTTs. We select our timeslot conservatively, at 1.25X the one-way (half-RTT) imposed latency between partnered APs plus 5ms (for non-emulated de- lays). APs schedule token passes in advance by twice the timeslot duration. We deploy a 3-AP topology and enforce that all APs partner together. We validate that throughput is stable across all artificially-varying Internet RTTs (drawn from apartment-complex measurements, Figure 3.11a). We report AP-to-client delivery latency as the metric of interest in Figure 3.11b. For reference, residential mea- surements in [57] show a median last-hop delay of «7ms/13ms for cable/DSL. We observe a mean RTT of 21.5ms.

Two-hop Time Synchronization

Beacon timestamps allow APs to maintain s-granularity synchronization with one- hop neighbors. To maintain time synchronization to a hidden terminal with a tenuous or nonexistent wireless link, we utilize an intermediate AP, within one-hop range of both APs individually. By combining two direct synchronization clock offsets, an AP

78 Two−hop Time Sync Error 1

0.8

0.6 AP1!AP2 Sync Offset 0.4 Internet Beacon Beacon Empirical CDF Empirical 0.2 AP1 AP2 AP3 0 0 1 2 3 4 5 Time Sync Error in Microseconds

Figure 3.12: (Inset) Intermediate APs relay clock offsets for time synchroniza- tion between hidden terminals. (Graph) Second-hop time synchronization error at- tributable to wired relay mechanism latency. derives a logical synchronization across two hops. To evaluate the precision of two- hop time synchronization, we deploy a three AP topology with all APs in single-hop range. For this test, we use an exceedingly-long 500ms beacon interval, increasing staleness to strain our system. To determine the loss of accuracy imposed by the addition of a second hop, we compare the synchronization offsets determined by one- hop and two-hop synchronization mechanisms. We find a mean difference of 1.5s with a standard deviation of 1.2s and max of 5s (Figure 3.12). We expect this to be representative of additive error across each hop of a multi-hop synchronization. Therefore, in a typical hidden terminal scenario, we anticipate mean total error to be bounded by 5¨2 “ 10s. Thus, our synchronization facilities are more than sufficient for hidden terminal analysis, using timing to find concurrent packets.

3.6.5 Scalability of Partnership-based TDMA

It may be unlikely that a real-world AP would encounter enough hidden terminals to necessitate many concurrent partnerships. Indeed, it is difficult to create such a scenario with the limited number of nodes available in our testbed. However, we

79 wanted to evaluate the scalability and robustness of our system under such an adverse environment. To this end, we modified our APs to disable carrier sense and deployed them in dense topologies. By creating an extreme proportion of hidden terminals (artificially), these tests necessitated many partnerships, providing greater system strain. Under these conditions, reported performance results are not in any way intended to be representative of a deployed system. Instead, performance enhance- ments are reflective of an ability of the system to adapt to more complex partnership formation.

Methodology

With carrier sense disabled, bidirectional traffic, including both TCP and link- layer acknowledgments, induces many collisions irrespective of external interference. Therefore, we consider unidirectional flows without link-layer ACKs (broadcast UDP traffic with MTU-sized datagrams). Since effective rate control is difficult without per-packet feedback, we use a fixed 12 Mbps bitrate. Transmission timing analysis for hidden terminal detection is similarly not possible (the AP cannot isolate which packets may have collided). Instead, APs rely only on peer RSSI feedback regarding occasional client upload packets. For regular 802.11, we leave carrier sense enabled and topologies under test have few, if any, natural hidden terminals. Thus, we con- sider the extent to which coordination mechanisms can be as effective as 802.11 in scheduling channel access.

6-link Testbed Benchmarks

We deployed 6 APs and 6 clients into 30 distinct topological configurations within our university facility. APs and clients were randomly dispersed in varied dense configurations. In Figure 3.13a, we present a CDF of per-link throughput (1.8X mean aggregate throughput gain over 802.11). Figure 3.13b shows a 2.5ms improvement

80 Scalability Test, Jitter Scalability Test, Throughput 1 1 0.8 0.8 0.6 0.6

0.4 0.4

Empirical CDF Empirical 0.2 0.2 802.11 802.11 Coordination Coordination 0 0 0 2 4 6 8 10 12 0 4 8 12 16 Per−link Throughput (Mbps) Jitter (ms) Scalability Test, Fairness 1 802.11 0.8 Coordination

0.6

0.4

0.2

0 0 0.2 0.4 0.6 0.8 1 Jain’s Fairness Index

Figure 3.13: Scalability test, 30 random 6-link topologies. CDF (a) throughput, (b) jitter and, (c) fairness. in mean jitter. Finally, Figure 3.13c shows that fairness is not negatively impacted by the coordination approach. We achieve a mean Jain’s fairness index of 0.78, compared to 0.76 for 802.11. With fairness and jitter improvements simultaneous to appreciable throughput gains, these tests reflect an ability of coordination-based TDMA to efficiently par- tition channel access. While we expect that throughput gains are primarily at- tributable to reduced exposed terminals, and are thus not representative of a deployed

81 system, they reflect positively on the robustness of the design and implementation of our distributed TDMA approach.

3.7 Related Work

3.7.1 Enterprise Network Management

Centralized EWLAN management has been considered in the context of fault diagno- sis [44], protocol extensibility [138], security enhancements, such as detecting rogue APs [22], AP channel assignment and power control [11], client association [139], and client localization [37]. Centaur [170] and Shuffle [120] consider conflict-based per-packet link scheduling, allowing hidden terminal mitigation via scheduling. RxIP accomplishes similar timing-based isolation for hidden terminals, but our Internet- based architecture allows deployment within RWLANs without shared infrastructure or a low-latency interconnect.

3.7.2 Hidden Terminal Mitigation

Substantial prior work has considered hidden terminal detection and recovery [104, 90]. While ZigZag decoding [68] has been shown to be effective in USRP testing, it cannot support legacy hardware. RxIP may be readily deployable in WLANs with commodity hardware clients.

3.7.3 Network Measurement

[57, 75, 114, 149] characterize the performance of residential broadband. [145] presents an extensive measurement study of home wireless network performance. [13] suggests that these networks may be dense and prone to user misconfiguration. [45, 170] characterize hidden terminal losses. [95] recognizes the exacerbated impact of hidden terminals on TCP.

82 3.7.4 Related Techniques

Z-MAC [156] considers hybrid TDMA/CSMA for sensor networks, suggesting gains deriving from reduced contention irrespective of hidden terminal presence. Jazzy- Mac [142] inspires our in-advance token-based establishment of TDMA schedules. SPIE [173] uses bloom filters for scalable per-packet state.

3.8 Conclusion

This chapter considers the Internet as a medium for AP-to-AP coordination of the wireless channel. Although similar in principle to existing approaches, we believe our application to the residential domain expands opportunities previously reserved for the enterprise. As implemented in our Click Router prototype, RxIP APs may (1) detect the presence of a hidden terminal, (2) isolate the cause to a particular peer AP, and (3) mitigate hidden terminal performances losses by establishing an interference- aware hybrid TDMA/CSMA schedule. By peer-to-peer negotiation of the wireless channel, traditionally-centralized techniques for enterprise wireless networks may now be extended to the home as well. Immediately, residential deployments can benefit from fault diagnosis/recovery, improved coverage, and optimized frequency assignments. Extension of this platform leaves a rich area open for exploration.

83 4 WiFi Energy Management via Traffic Isolation

WiFi continues to be a prime source of energy consumption in mobile devices. This chapter observes that, despite a rich body of research in WiFi energy management, there is room for improvement. Our key finding is that WiFi energy optimizations have conventionally been designed with a single AP in mind. However, network con- tention among different APs can dramatically increase a client’s energy consump- tion. Each client may have to keep awake for long durations before its own AP gets a chance to send it packets to it. As AP density increases, the waiting time inflates, resulting in a proportional decrease in battery life. We design SleepWell, a system that achieves energy efficiency by evading network contention. The APs regulate the sleeping window of their clients in a way that different APs are active/inactive during non-overlapping time windows. The solution is analogous to the common wisdom of going late to office and coming back late, thereby avoiding the rush hours. We implement SleepWell on a testbed of 8 Laptops and 9 Android phones, and evaluate it over a wide variety of scenarios and traffic patterns. Results show a median gain of up to 2x when WiFi links are strong; when links are weak and the network density is high, the gains can be even more.

84 4.1 Introduction

Eergy management in mobile devices continues to be a relevant problem. The prob- lem is becoming pronounced, especially with the always-connected usage model of modern devices. Smartphones, for instance, are rapidly becoming the convergent platform for a large variety of network applications, including emails, music, videos, games, web browsing, and picture sharing [61, 62, 24]. In addition, background ap- plications are continuously running push-based alert services [166], location based notifications [175], and periodic sensor updates [131]. This growth in network traffic is beginning to impose a heavy demand on the phone battery, to the extent that some users are already expressing dissatisfaction [35]. The inability to cope with the energy demands can be serious, and may even hinder the steady growth in the mobile computing industry. WiFi network communication is a predominate source of energy consumption. This has been well known for many years, and a rich body of research has addressed the problems in various ways. For example, WiFi Power Save Mode (PSM) [7] is one of the early protocols that attempts to turn off the device whenever beneficial. While WiFi energy efficiency has progressively improved since PSM (with the most recent NAPman protocol [159] offering substantial gains), we find that there is still opportunity for improvement. We describe this opportunity by first describing the core ideas in PSM and NAPman, and then identifying their respective deficiencies. Consider the scenario in which a WiFi AP intends to communicate to a battery- operated mobile client. With WiFi PSM, the client periodically wakes up to listen to advertisements from the AP. The advertisements include client identifiers for which the AP has queued packets. If a client C learns that the AP has packets for C, it wakes up the entire radio; otherwise, it continues sleeping in the low power state. Importantly, waking up the radio incurs a high energy cost, and hence, it is unpro-

85 ductive if the client downloads only a few packets after waking up. Therefore, to amortize the wake-up cost, PSM clients are made to wake-up less frequently, per- mitting multiple packets to queue up at the AP. Of course, such queuing introduces latency in PSM packet delivery. Nevertheless, since a large number of mobile ap- plications (email, buffered video, push updates) are reasonably tolerant to latency, PSM correctly takes advantage of it. In current Nexus One phones running Android, the WiFi PSM mode wakes up in the orders of 300ms to download bursts of packets. This is a judicious design decision, with proven energy benefits. Recently, authors in NAPman [159] showed the possibility of improvements with PSM. The core observation is that multiple clients (associated to the same AP) may wake up after an AP advertisement, and expect to receive their respective burst of packets. However, since the AP can transmit one packet at a time (in a round robin manner to each client), every client must remain awake for a longer duration to receive its packets. This is a source of energy wastage, and NAPman mitigates this through virtualized APs. Briefly, the key idea is to make each client believe that it is associated to a different AP, and thus, have their wake up windows staggered over time. The ideas from NAPman offer energy gains, while also improving the fairness among PSM and non-PSM clients. We observe that NAPman improves PSM in the cases where an isolated AP is connected to multiple clients. In reality, multiple APs are within the wireless vicinity, and this strongly impacts the energy consumption of individual clients. Specifically, when a PSM client wakes up to download its own burst of packets, it has to share the channel with all other clients of all other APs in the vicinity. In homes or dense office areas, it is not unusual to overhear 5 to 10 other APs. Since the APs are likely to share the channel fairly between them, it is possible that a client remains awake almost 5 times longer, than it would if there was no contention with other APs. Thus, the energy wastage during network activity can be 5 times, and even more if other APs

86 have multiple clients associated to them. We believe that PSM and NAPman can be significantly improved if the energy-wastage from network contention is alleviated. Mitigating network contention from the energy perspective is a relatively unexplored space, especially in the face of emerging applications and usage patterns. SleepWell is tasked to investigate and solve this problem. The core idea in SleepWell is simple1. Briefly, since APs are always powered on, they monitor ongoing wireless traffic from nearby APs. Since PSM creates periodic bursts of traffic, each AP tracks the periodicity of other APs, and dynamically re- schedules its own period to minimally overlap with others. Reduced overlap reduces competition, allowing each client to download its own packets uninterrupted, and sleep when the channel is occupied by other transmissions. This bears resemblance to a distributed TDMA scheme, but executed with energy-efficiency in mind. The main design challenges in SleepWell appear from: (1) distributedly schedul- ing these traffic bursts to achieve quick convergence, (2) ensuring clients do not get disassociated during dynamic rescheduling, and (3) preserving channel utilization, la- tency, and fairness, even under traffic variation and node churn. SleepWell addresses these systematically, while requiring no software changes at the client. By carefully modifying the timestamps (as a part of the WiFi clock synchronization process), the SleepWell AP regulates the client’s sleep and wake-up schedules. The client re- mains unaware of the changes in its own duty cycle; neither does it get disassociated. 802.11a/g/n standard-compatibility remains intact. We have implemented SleepWell on a testbed of 8 laptops and 9 Nexus One phones running the Android OS. Performance results show that energy reductions vary between 38% to 51%, across a variety of real online applications, including YouTube, Pandora and Last.fm Internet radio, and TCP bulk data transfer (e.g,

1 We rejected a number of involved designs, thereby trading off some performance for standard- compliance and scalability.

87 FTP). Moreover, as the quality of links degrade, i.e., each packet is transmitted at lower bit rates (longer time), the relative energy gains improve. In light of these results, we believe that SleepWell may be an effective solution for the future, not only to sustain a demanding suite of applications, but also to improve “immunity” to increasingly dense WiFi environments. Our main contributions may be summarized as follows.

• Characterize the problem of network contention and its impact on energy consumption. Through measurements we show that the energy- wastage is severe, especially with high device densities in the environment.

• Design a lightweight, standard-compatible system running at the AP, that isolates traffic to reduce contention. The system requires no changes to the client, and can quickly adapt to changing traffic conditions and node churn.

• Implement and evaluate the system on a testbed of 8 laptops (acting as APs) and 9 Nexus One phones as clients. Promising performance improvements provide confidence that SleepWell can be an important step towards energy management in WiFi-enabled mobile devices.

The rest of this chapter expands on each of these contributions. We motivate the SleepWell design through measurements in Section 4.2, followed by the system design in Section 4.3. The system implementation and evaluation are presented in Section 4.4, while limitations and future work are discussed in Section 4.5. Sec- tion 4.6 surveys the related work, and the chapter concludes with a brief summary in Section 4.7.

88 4.2 Background and Measurements

We motivate SleepWell through measurements. In this section we (1) discuss our choice of platform and measurement set-up, (2) introduce the terminology for PSM operation, (3) profile the PSM behavior of a state-of-the-art smartphone, and (4) present measurement results suggesting that network contention has a dramatic im- pact on energy consumption, and correspondingly, battery life.

4.2.1 Choice of Device

For the experimentation platform, we planned on choosing a state-of-the-art mobile device that would satisfy three conditions: (1) provide accessible battery contacts to connect the device to the power monitor; (2) have up-to-date WiFi hardware, includ- ing 802.11n; (3) be supported with chipsets/drivers that optimizes for device energy. Natural candidates are the Apple iPhone (version 4 supports 802.11n), highest-end HTC/Android phones, or Windows Mobile smartphones. The iPhone is unsuitable for testing due to its self-enclosed battery design [159]. The Android-based Google Nexus One seemed attractive. In particular, documentation for the Nexus One’s Broadcom BCM4329 802.11a/b/g/n chipset claims “technologies to reduce active and idle power consumption”, including an on-chip power management module. We also observed that the Android OS performs adaptive PSM, intelligently switching between different power modes, based on (1) whether the screen is on; (2) traffic load; (3) the beacon interval, etc. Finally, when compared with a Windows 6.5 HTC phone, both network performance and energy efficiency were better with Nexus Ones. In light of these, we deemed Android Nexus One’s as our platform of choice.

4.2.2 Measurement Set-up

To measure energy consumption in the Nexus One phones, we used two power mon- itors from Monsoon Solutions [133]. The probes from a monitor were connected to

89 Figure 4.1: Shows experimental setup with Nexus One phone connected to power meter via copper tape and DC leads. The phone is entirely powered by the power meter, using the lithium battery only as ground. The computer, connected via USB, records current and voltage at 5000 hertz. a hand-engineered copper-wire extension of the Nexus One lithium-ion battery as shown in Figure 4.1. We sanity-checked this set-up by comparing our basic measure- ments with hardware data-sheets and other surveys in literature [159].

4.2.3 Terminology

To ground our discussion of WiFi power consumption, it is necessary to consider how a modern, power-optimized device uses 802.11 networks from an energy perspective. We first review relevant terminology. WiFi Power Save Mode (PSM): A suite of polling-based power-optimizations specified by the IEEE 802.11 standard and incorporated in all WiFi implementa- tions [7]. Constant Awake Mode (CAM): When a PSM-capable WiFi device temporar- ily disables PSM to minimize latency for interactive traffic.

90 Adaptive PSM: Traffic-aware switching between PSM and CAM to balance energy and interactivity requirements. While not specified by the 802.11 standard, adaptive PSM is a common among modern smartphones. Time Unit (TU): a period of 1024 microseconds (s), or 1.024ms. According to the 802.11 standard, beacon intervals are expressed in TU. For simplicity, we use TU and ms interchangeably. Beacon Interval: Fixed time duration between two successive AP beacons, typically 100ms (APs continuously transmit these beacons). Traffic Indication Message (TIM): Virtual “bitmap” embedded in every AP beacon. Indicates which PSM clients should poll to receive queued unicast download packets. Listen Interval: How often a client chooses to wake up to listen for one AP beacon. Listen intervals are an exact multiple of the beacon interval. PS-Poll: Client notification to its AP that it is awake and ready to receive a queued packet. Issued immediately after a client recognizes its own ID in a TIM. More Data Flag: Flag embedded in download unicast data packets that spec- ifies whether more data packets are queued at the AP for a PSM client. Once this flag is set to false, the client may immediately return to sleep.

4.2.4 PSM Energy Profiling

We survey the energy behavior of PSM, as it dwells at (or transitions between) different power levels in response to network activities. The power values are taken with the screen off, WiFi associated to a nearby AP, bluetooth/GSM/3G radios disabled (airplane mode), and minimal background application activity. While our analysis is primarily grounded on our Nexus One measurements, we observed similar behavior on an older Windows Mobile device (albeit different exact power drawn).

91 700 600 High Power 500 Idle/Overhear 400 Beacons 300 Deep 200 Power (mW) Power Sleep 100 Light Sleep 0 Time (s) 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure 4.2: (a) Screenshot from Monsoon power meter; (b) Power draw over time for Pandora music streaming.

Figure 4.2 shows the anatomy of a Nexus One PSM client, tasked to stream music from the Pandora service (Figure 4.2(a) shows a screenshot for the power meter, while Figure 4.2(b) zooms into a time segment of the measurement). At the beginning, the radio is in PSM Deep-Sleep, at « 10mW . In this mode, clients are only able to wakeup and receive scheduled beacons from an associated AP (wakeups shown by spikes that reach up to « 250mW ). Once a client receives a TIM advertisement notifying pending traffic (around t=1.2s), it transitions to the highest power level and sends a PS-Poll to retrieve a queued packet. This transition from deep-sleep to High Power state incurs an additional wake-up energy cost (« 600mW ). The client then receives a burst of data packets and responds with ACKs (around t=1.25s), all of which also incur high energy. Now, while waiting for the next packet, the client cannot power down because the radio hardware continues to “overhear” packets from other APs/clients. Thus, the client dwells in this Idle/Overhear state, periodically transitioning to high power

92 for transmitting/receiving its own packets. The cost of overhearing (« 400mW ) is less than receiving because overheard packets are dropped at the radio, saving com- putation energy higher up in the protocol stack. Nonetheless, the cost of overhearing is far greater than PSM Light-Sleep. Once the queued packets have been downloaded, the client does not go back to deep-sleep to amortize the wake-up/shutdown cost over multiple packets. The client transitions to PSM light-sleep (« 120mW ) in anticipation of efficiently waking up for subsequent bursts. In this state, it continues to periodically wake up and receive AP beacons, but is “deaf” to contending traffic. Later, when all transmissions are over and the AP has turned off the MORE DATA flag, the client shuts down to deep-sleep. Shutting down to deep-sleep (not shown) also incurs a high energy cost, similar to waking up Observe that these dwell-times and transitions between energy levels fundamen- tally define the energy-efficiency of the system. Where appropriate, we will show how SleepWell alters the PSM behavior so that it dwells longer in lower-energy states, while remaining as agile with respect to upload/download packets.

4.2.5 Impact of Network Contention on Energy

Energy-consumption is a function of a large number of parameters (hardware, traffic, bit rates, mobility, topology, density, etc.). Measuring over all permutations of this parameter space is difficult. We have narrowed down the space to a smaller set of common-case scenarios, and report measurements from them.

Methodology

We used a combination of Dell and Lenovo laptops as APs. Varied topological config- urations were used across multiple experimental trials – the configurations were made to mimic measured link qualities from the Engineering building at Duke University.

93 40 35 Iperf 30 YouTube 25 20 15 10 Total Energy inJoules Energy (J) Total 5 0 1 AP 2 AP 3 AP 4 AP 5 AP 6 AP 7 AP 8 AP

Figure 4.3: Energy consumed under bulk data transfer and YouTube replay with varying contention (i.e., increasing number of APs in the vicinity).

As a result, most of the Nexus One phones operated between 48 and 72 Mbps (recall that the Nexus One supports 802.11n bitrates). To generate realistic traffic, we used a software tool called Tcpreplay [183]. This tool allowed us to record packets for any arbitrary Internet download, and later replay the packet arrival sequence and timing within our local testbed. This allowed for repeatable experiments across dif- ferent types of traffic, including Pandora and Last.fm music streaming and YouTube videos. We also generated some synthetic TCP traffic using Iperf, representative of bulk data transfer, such as an FTP or HTTP download.

Results

Figure 4.3 shows the variation of total energy consumption with increasing network contention (all flows performing TCP downloads). With increasing number of AP- client links, and correspondingly elevated channel saturation, PSM clients are forced to stay awake in the idle/overhear mode (« 400mW ) for longer proportions of time. Thus the energy required to complete the same network workload increases.

94 Iperf, Bulk TCP Data Transfer YouTube 100% 100% 90% 90% 80% 80% 70% 70% High Power 60% 60% Idle/Overhear 50% 50% Light Sleep 40% High Power 40% Deep Sleep 30% Idle/Overhear 30% 20% Light Sleep 20% 10% Deep Sleep 10% 0% 0% 1 AP 2 AP 3 AP 4 AP 5 AP 6 AP 7 AP 8 AP ] 1 AP 2 AP 3 AP 4 AP 5 AP 6 AP 7 AP 8 AP

Figure 4.4: Proportion of time spent in each power level. (a) 8 MB TCP Iperf; (b) YouTube w/ Tcpreplay.

Figures 4.4 zooms into the results and breaks down the proportion of time spent at each energy level. Also, we separate out the traffic patterns – a 8 MB TCP bulk data transfer (measured with Iperf) and a YouTube session via Tcpreplay2. To present an unbiased result, Figure 4.4(a) captures 90s of bulk download measurement, even though all transfers completed before 90s. This accounts for the system-wide deep- sleep energy consumed after a transfer completes. For YouTube (Figure 4.4(b)), we highlight a 60s portion of the trace, covering periods of buffering and playback. There was no network activity during playback, hence, clients were in deep-sleep by the end of the trace. Even with a fair balance between wake-up and sleep, we see that network contention forces a client to spend a much lower fraction of time in the efficient PSM sleep modes. SleepWell is designed to evade network contention, returning clients to light-sleep mode as frequently as possible.

2 Results from Pandora and last.fm music streaming (not shown) are consistent with YouTube, albeit some variations in energy levels due to lower packet injection rate at the server.

95 4.3 SleepWell Design

The SleepWell design firmed up after multiple rounds of testing and modifications. To convey some of the rationale in the final design, we first describe a basic version of SleepWell under the following assumptions.

1. All APs have saturated traffic.

2. Each AP has one client.

3. All APs are running SleepWell.

Thereafter, we relax the assumptions, and modify SleepWell to be applicable to real-world networks.

4.3.1 Basic SleepWell

SleepWell has 3 main modules, (1) traffic monitoring, (2) traffic migration, and (3) traffic preemption.

(1) Traffic Monitoring

At bootstrap, each AP behaves similar to standard 802.11 – when their respec- tive PSM clients wake up, the APs contend for the channel and send packets to them. However, a SleepWell AP also listens for ongoing beacons, and identifies which other APs are within its collision domain (we consider hidden terminals later in Section 4.5). Observe that beacons are transmitted at base rate, and hence, are audible over the carrier sensing zone of an AP [184]. Each AP assimilates this infor- mation into a traffic map that captures when each of its contending APs start their beacon intervals. The maps can clearly be different at different APs, depending on the AP’s neighborhood. Figure 4.5 shows an example topology, and the correspond- ing maps assimilated by AP1 and AP3 (AP2’s map is identical to AP1). Since PSM

96 AP2 AP2 AP3 AP AP1's 0 AP3's 0 3 map 16 16 map AP4 AP1 22 70 4 70 AP3 AP 61 AP1 Time AP1 Time AP5 AP2 AP5

Figure 4.5: AP1 and AP3’s traffic maps during bootstrap (AP2’s map, not shown, is identical to AP1’s). The circle denotes one BEACON INTERVAL of 100ms. The ticks on the circle denote when an AP has overheard beacons from other APs, as well as the time of its own beacon. The traffic maps clearly depend on the neighborhood.

transmission bursts will immediately follow a beacon, these bursts are likely to over- lap, forcing APs to waste energy due to traffic contention. SleepWell aims to avoid this contention through traffic migration.

(2) Traffic Migration

Given n other contending APs in the traffic map, each AP computes its fair share of

1 the channel. The fair share is expected to be at least pn`1q . Each AP also computes its actual share of the channel as the time from its beacon to the immediate next (in the clockwise direction). If an AP’s actual share is less than its fair share, and assuming that the AP has saturated traffic, the AP is said to be unsatisfied. Now, each unsatisfied AP looks into its traffic map and finds the largest inter-beacon interval, not including its own beacon – denote the start and end points of this

interval as Tstart and Tend. If this interval is twice that of the AP’s fair channel share, then the AP moves its own beacon to the mid-point of this interval. However, if the

1 interval is shorter, the AP migrates its beacon to a time T , such that Tend´T “ pn`1q . Essentially, every SleepWell AP greedily migrates its traffic, claiming at least its fair share from the largest available interval. If this migration encroaches on another

97 AP1's map AP3's map AP2's map AP2 AP2 AP3 Time AP3 0 0 AP2 AP3 AP4 80 16 19 80 22

61 58 58 58 AP5 AP1 AP1 AP1

AP1 moves to 58 AP3 moves to 80 AP2 moves to 19

Figure 4.6: APs 1, 2, and 3 migrate their traffic per the SleepWell heuristic. Over time, the beacons are spread in time, alleviating contention between APs.

AP’s traffic and fair share, the other AP should also attempt to migrate. On the other hand, if there is more time available, the AP shares the excess equally with the AP which owns the now-preceding beacon. We present an example to better capture the operation. Consider Figure 4.5. All SleepWell APs use a 100ms beacon interval (denoted

100 by a circle). For AP1’s network view, the fair share is 3 “ 33.33ms, and the length of the largest segment is 84ms (i.e., Tstart “ 16 and Tend “ 0). Assume AP1’s beacon is currently at 70. Thus, AP1 moves its PSM beacon from 70 to 58, the mid-point of the largest segment (Figure 4.6(a)). AP3 observes the new position of

100 AP1 and prepares to make its own move. AP3’s fair share is 5 “ 20ms, and largest segment is 39 (i.e., Tstart “ 61 and Tend “ 0). Since the largest segment is less than twice of the fair share, AP3 migrates from 16 to 80 (Figure 4.6(b)), thus claiming its fair share, and forcing AP5 to move from time 61. Similarly, AP2 observes AP1 at 58 and AP3 at 80, and moves to the mid-point of 80 and 58, which is 19 (Figure 4.6(c)). Observe that from AP1 and AP2’s perspectives, the traffic map begins to exhibit more uniformity in beacon separation. AP3’s neighbors also perform the same operation (not shown), making AP3’s traffic map uniform as well.

98 Over time, we expect all APs to converge to a reasonably uniform traffic map, thereby reducing the energy wastage from contention. In some cases, cyclical re- adjustment patterns may slow or break convergence, especially in large, dense net- work graphs where adjacent APs share highly-divergent views of the local neighbor- hood. To recover, we detect such cases, and trigger a randomization step – poorly converged nodes temporarily assume a random beacon assignment, breaking the non-converging cycle. Of course, this is a heuristic and may not converge to the optimal solution (the optimal beacon positioning corresponds to a TDMA sched- ule, and is thus NP-Complete [152], reduced from graph coloring). However, Monte Carlo simulations of 10,000 topologies and traffic patterns show that convergence is quick, reliable, and results in substantially better beacon placements than random assignment. We report these results in Section 4.4.

(3) Traffic Preemption

With 802.11, a client wakes up at the PSM beacon times and downloads packets until its AP turns off the MORE DATA flag, indicating no more traffic. Continuous downloads at different APs induce continuous contention, resulting in significant energy wastage. Spreading the PSM beacons apart, as performed by SleepWell, will evade contention for some time, but the bursts will soon “spill” into the next bursts, reintroducing contention. To avoid this, SleepWell employs a simple preemptive

mechanism. When APi observes that its traffic is likely to “spill” into APj’s, it turns off the MORE DATA flag in the subsequent data packet, forcing its client to go to sleep

until the next listen interval. This permits APj’s transmissions to progress without competition, reducing time to completion. When APi’s client wakes up at the next

PSM beacon, APi transmits the pending packets. Now the other APs preempt their respective transmissions, allowing APi to use the channel without contention. This

99 C3

AP2 AP3 13 80 C2

46

C1 AP1

Figure 4.7: SleepWell APs distributedly stagger their beacons to reduce contention. Each AP preempts its traffic to honor another AP’s schedule.

is indeed a loose form of TDMA, where clients “avoid the rush hours” and sleep instead. Figure 4.7 shows the steady state operation with an example.

Observe that APi need not always preempt its traffic for APj. It is possi-

ble that APj’s client is not awake in the same BEACON INTERVAL as APi’s (re- call that PSM clients periodically wake up every LISTEN INTERVAL, e.g., once in

3 BEACON INTERVALs for Nexus One phones). In such a scenario, APi should detect

the opportunity and use up APj’s slot. Of course, the detection mechanism needs to

be robust to ensure that APi does not mistakenly encroach into APj.

SleepWell is designed to handle this situation. When APi approaches APj’s time slot, it looks for (1) any PS-Poll from any of APj’s PSM clients; (2) APj’s download packets with the MORE DATA flag enabled; or (3) an ACK from one of APj’s clients. (1) and (2) may not be always feasible as high bitrate transmissions may prevent overhearing at APi. (3) is more robust, as ACKs are transmitted at a lower bitrate, often at half the transmission rate of the preceding unicast packet. In case all of these techniques fail, SleepWell defaults to a simple inference scheme. APi looks

100 into APj’s prior beacons to see if APj has pending traffic for any of its clients (recall that the beacon TIM embeds pending traffic information). When the TIM is not

set, APi does not preempt its own traffic, and continues transmission through APj’s slot. This ensures channel utilization. This concludes the description of Basic SleepWell under the 3 assumptions of saturated traffic, single client, and no legacy APs. We now relax these assumptions to fit real world scenarios.

4.3.2 Coping with Traffic Dynamics

The traffic migration heuristic in Basic SleepWell has deficiencies. While, upon convergence, each AP certainly receives its fair share of the channel, the heuristic may not cope well with dynamic traffic offered by different APs. The issues arise in 2 main scenarios: (1) Consider the case where an AP has n neighbors, but one of its neighbors has m ą n neighbors (e.g., in Figure 4.5, AP1 has 3 neighbors, but AP3 has 5). Here, AP1 could be satisfied by 1{3 channel share, assuming that AP2 and AP3 will also consume 1{3 each. However, AP3 will only be able to consume 1{5, opening up some slack in channel time. Basic SleepWell may not be able to consume this slack. (2) In the same topology of Figure 4.5, if AP2 has little traffic (requiring less than 1{3 channel time), Basic SleepWell will again fail to exploit this slack. We modify SleepWell to better “absorb” the slack, and thereby cope with dynamic traffic patterns. Algorithm 3 shows pseudocode for the modified SleepWell. Although the pseu- docode seems involved, the key idea is simple. In face of varying traffic demands, we require SleepWell APs to advertise the minimum of the needed channel share and the available channel share. In Figure 4.5 for instance, AP3 will advertise 1{5 if it has adequate traffic to fill up its own slot. Otherwise, if it has queued traffic only for, say 1{7 channel time, it advertises 1{7. Knowing this information, the traffic map

101 Algorithm 3 Traffic-Aware Beacon Adjustment 1: Input: P : Set of all peer APs 2: if satCounter ą CONVERGENCE THRESHOLD then 3: satCounter Ð 0 4: newBeaconT ime Ð Rand(r0, 1q) ¨ BEACON INTERVAL 5: else 6: satCounter Ð satCounter ` 1 7: fairShare Ð BEACON INTERVAL{p|P |` 1q 8: share Ð fairShare 9: gapEnd Ð BeaconTime(.) + TrafficAdvert(.) 10: slack Ð 0 11: for all AP p1 P P do 12: gap Ð BEACON INTERVAL 13: for all AP p2 P P do 14: s Ð BeaconTime(p2) - BeaconTime(p1) 15: gap Ð minps, gapq 16: midpointGap Ð gap{2 17: trafficGap Ð gap - TrafficAdvert(p1) 18: slack Ð maxptrafficGap, slackq 19: available Ð maxpmidpointGap, trafficGapq 20: available Ð minpavailable, gap ´ ǫq 21: if available ą share then 22: share Ð available 23: gapEnd Ð BeaconTime(p1) + gap 24: expectedShare Ð fairShare ` slack{|P | 25: share Ð maxpexpectedShare, shareq 26: newBeaconT ime Ð gapEnd ´ share 27: Return newBeaconT ime can be updated to additionally reflect the burst following each PSM beacon. This facilitates efficient traffic migration. SleepWell now computes the maximum interval as the separation between the end of the ith burst to the i ` 1th beacon. Now, the actual migration rule also changes. If AP1 recognizes that AP3 is taking up less than its fair share, then AP1 computes the slack and attempts to redistribute it among

1 1 2 APs that need more. In this case, the slack is 5 ´ 7 “ 35 , which distributed between 1 1 1 AP1 and AP2 becomes 35 . AP1 updates its fair share as 3 ` 35 “ 0.36. Using this fair share, and the largest interval computed from burst-to-next-beacon, SleepWell migrates its traffic according to the original rule. AP2 does the same, and the system is expected to still converge. If AP3 later changes its traffic advertisement, or an AP joins or leaves the network, AP1 and AP2 can adapt accordingly.

102 4.3.3 Seamless Beacon Re-adjustment

In describing Basic SleepWell, we assumed that APs can re-adjust beacons at will. This is non-trivial in actual systems because 802.11 PSM does not have provisions to inform the client about new beacon timings. Incorporating this capability would require client-side changes, and SleepWell intends to avoid it. Existing schemes such as NAPman [159] employ virtualized APs [37], a method that makes a single AP advertise multiple beacons with different SSIDs. This effectively defeats passive scanning; clients must actively scan for APs by issuing a PROBE REQUEST for a known SSID, and then re-associate to the BSSID of the virtual AP. To re-position a client again, the existing association must be dropped, forcing the client to re-associate through active scanning again. This is a heavyweight process, with significant time spent in the idle/overhear and high power activity levels, exacerbating the energy consumption in clients. Prior work has shown that the cost of associations can dominate the energy consumption in WiFi [25]. SleepWell APs can re-position clients without client-side changes and re- associations. The key idea here is to manipulate the TSF timestamps in advertised beacons. As mandated by the 802.11 standard, clients treat these timestamps as authoritative, and correspondingly update their clocks to a new beacon schedule. By advertising different beacon schedules to different clients, SleepWell APs move clients between beacons until a desirable distribution is reached. Consider the example in Figures 4.5 and 4.6 where AP1 intends to move its client from 70 to 58. Given that AP1 and its client are clock-synchronized, AP1 advertises the time as 12ms ahead of its current time. The client updates its clock accordingly. At absolute time t “ 58 the client believes that it has reached t “ 70, and wakes up to receive packets. Now, the AP also transmits a beacon at 58, effectively re-positioning its client seamlessly. We believe this technique is lightweight and scalable, and may

103 be useful to other protocols (including NAPman [159]) that require traffic isolation among clients.

4.3.4 Multiple Clients per AP

For ease of explanation, we assumed one client per AP in Basic SleepWell. In reality, SleepWell can operate seamlessly with multiple clients associated to the same AP.

Specifically, when APi has its transmission slot, it will transmit to each of its clients in the way the packets are queued up for them. At the cost of a little more complexity, the AP can create additional beacon schedules within its own time slot, ensuring that its own clients do not contend with each other. This can be accomplished by performing the same SleepWell beacon adjusting operations. Of course, this is aligned with the core beacon staggering ideas in NAPman [159], and hence, we do not claim this to be SleepWell’s contribution. Nonetheless, SleepWell’s technique of lightweight beacon re-adjusting can make this inter-client scheduling more efficient than NAPman. We briefly outline the mechanism next.

Inter-client SleepWell

The goal here is to disperse clients across different beacons. SleepWell APs predict when their own clients are expected to wakeup for a beacon (depending on the per- client LISTEN INTERVAL). Since not all clients will wakeup within the same beacon interval, there can be opportunities to direct some clients to change their beacon wakeup schedules independently from other peers. However, when a pair of clients systematically share the same wakeup schedule, SleepWell unicasts an extra beacon to a particular client. This produces the desired schedule change. Recall that a client is moved from one beacon schedule to another when it receives a manipulated (but authoritative) beacon TSF clock value. Upon the next wakeup on the new schedule, the client must receive a clock value in line with the new schedule

104 or else it will again be migrated. Thus the clock value advertised by beacons on the new schedule must reflect the time separation between the old and new schedules. If other clients are already listening to the “new” wakeup schedule (and are not to be migrated elsewhere), their clock value should not be changed. To prevent disruptions to any client, and yet allow migrations from one wakeup schedule to another, the initial clock values for each beacon schedule are selected to represent the time difference between a pair of successive beacons.

4.3.5 Compatibility with Adaptive-PSM Clients

SleepWell is unaffected to adaptive-PSM clients. Like those of traditional PSM clients, adaptive PSM downloads occur in a regular burst pattern, immediately fol- lowing AP beacons. Also, it is mandatory for all clients to wake up at least once per LISTEN INTERVAL to receive their AP’s beacon TIM advertisement. Thus, if data is available for the client, it will initiate a stream of download packets from the AP (either through a traditional PS-Poll request or by temporarily disabling PSM). As with regular PSM clients, this periodic, bursty behavior may often be obvious and easily predictable, thereby enabling SleepWell to converge on traffic-aware beacon schedules.

4.4 Evaluation

In this section, we present the implementation of SleepWell and evaluate its perfor- mance.

4.4.1 Implementation

We implemented SleepWell as a set of modifications to the open source ath9k driver for Atheros 802.11n PCI/PCI-express interfaces. Driver-level modifications were re- quired to (1) enable dynamic adjustment of beacon timing; (2) control TSF clock

105 values advertised in beacons; (3) enable driver interrupts to quickly receive over- heard beacons and packets from adjacent BSSes; and (4) exert timely control on the MORE DATA flag for outbound traffic. Our implementation provides complete sup- port for dynamic beacon adjustment, traffic migration and preemption, and multiple staggered beacons per AP3.

4.4.2 Methodology

Experimental setup was consistent to that described for our earlier measurement set- up in Section 4.2.2. We deployed SleepWell on a testbed of 8 Dell and Lenovo laptops, serving as WiFi APs. Laptops were configured with Atheros chipset D-Link DWA- 643 ExpressCard 802.11n WLAN interfaces. Linux kernel 2.6.3 with the hostapd daemon provided 802.11-compliant AP association support. Unmodified Nexus One smartphones (Broadcom BCM4329 802.11n WiFi chipset [33]) served as clients. In most tests, we tasked up to 7 AP/client pairs using Iperf TCP to create background traffic. While testing different applications (e.g., YouTube), Wireshark recorded packet traces, and Tcpreplay replayed them for different experiments. To closely model real-world behavior, we kept the phone screen on during trace collection. When replaying this trace, we turned off the screen to precisely measure the power draw. All energy measurements reflect client usage of full-time PSM, not adaptive PSM technique (which disables PSM for some applications to reduce latency).

4.4.3 Performance Results

Our evaluation attempts to answer the following:

1. Beacon adjusting heuristic. Correctness (Fig. 4.8). Convergence via Monte Carlo simulation (Fig. 4.9).

3 Due to interrupt timing limitations of the Atheros hardware, we cannot reliably support staggered beacons closer than 12ms of spacing (a driver interrupt must occur at least 2ms after a beacon and 10ms before the next beacon). Therefore, we can support a maximum of 8 beacons per beacon interval.

106 2. Overall energy gain for different traffic patterns (Fig. 4.10).

3. Impact of network contention (Fig. 4.11, 4.12). Gap from optimal case of zero-contention (Fig. 4.13).

4. Impact of link quality (bitrates) (Fig. 4.15).

5. Impact on throughput, latency (Fig. 4.17, 4.18). Impact on beacon spacing (Fig. 4.16).

6. Impact on fairness (Fig. 4.19).

7. Performance by deployment density (Fig. 4.20, 4.21).

(1) Beacon adjustment and convergence

Figure 4.8 shows a zoom-in view of how 2 SleepWell clients (associated to distinct APs) adjust their beacons and preempt traffic to converge on to non-overlapping traffic bursts. The graph is a 4 second segment of a TCP download. Each client periodically wakes up and stays active in the high power state, while the other remains in light-sleep during that interval. Figure 4.8(c) contrasts this behavior with 802.11. Under the same experiment settings, each 802.11 client stays awake continuously, sharing the channel with the other in fine time scales. Clearly, this is a source of energy wastage, and SleepWell mitigates it. To evaluate convergence of these schedules, we performed Monte Carlo simula- tions with 1000 APs in a 1km x1km area. Two APs were considered in range of each other within 40m. We ran 10,000 trials with different topological configurations. We assumed a 50-50 mix of SleepWell and legacy APs. For results involving bounded traffic demand, we assumed uniform random demand between 0 and 50ms per bea- con. Figure 4.9 shows the CDF of convergence time (including the cases where

107 800 SleepWell, 2 AP (Client A) 600 400 200 Power (mW) Power 0 800 0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 SleepWell, 2 AP (Client B) 600 400 200 Power (mW) Power 0 800 0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 802.11, 2 AP 600 400 200 Power (mW) Power 0 0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 Time (s)

Figure 4.8: (a, b) Two SleepWell clients converge to non-overlapping activity cy- cles, one sleeping when the other is active. (c) Under same experiment settings, 802.11 client stays awake for entire TCP download. randomization was necessary to break oscillations). Evidently, SleepWell achieves fast convergence, even with the traffic advertisement heuristic.

(2) Overall energy gain (varied traffic patterns)

Figure 4.10 presents overall energy consumption during (a) bulk data transfer, (b) Youtube, and (c) Pandora tests. The experiments are performed for 8 AP/client pairs within mutual contending range. As a zero-contention baseline, we show the results from a single AP/client network running 802.11. SleepWell demonstrates substantial energy savings, nearing the baseline for YouTube and Pandora. In contrast, 802.11

108 1

0.8

0.6 SleepWell w/o Advertisements 0.4 SleepWell, Unbounded Traffic Empirical CDF Empirical SleepWell, Bounded Traffic 0.2

0 0 2 4 6 8 10 Rounds Until Convergence

Figure 4.9: Adjustment rounds until a SleepWell AP reaches a converged beacon placement.

PSM wastes energy staying awake through much of the contending traffic, as observed earlier in Figure 4.8(c).

40 35 No Contention 802.11, 8 AP 30 SleepWell, 8 AP 25 20 15 10

Total Energy in Joules (J) (J) Joules in Energy Total 5 0 Iperf YouTube Pandora Figure 4.10: Overall energy performance of SleepWell.

(3) Impact of network contention

Figure 4.4(a) from Section 4.2.2 showed how network contention increases the fraction of time a PSM client stays in the idle/overhear state. Figure 4.11 demonstrates how SleepWell mitigates the problem for the same (TCP bulk transfer) experiments. As anticipated, SleepWell powers down clients to PSM light-sleep, allowing them to save

109 Iperf, Bulk TCP Data Transfer 100% 90% 80% 70% 60% 50% 40% High Power 30% Idle/Overhear 20% Light Sleep 10% Deep Sleep 0% 1 AP 2 AP 3 AP 4 AP 5 AP 6 AP 7 AP 8 AP

Figure 4.11: 8 MB Iperf TCP download. With higher contention, SleepWell spends a larger fraction of time in light-sleep, whereas, 802.11 spends most of the time in the idle/overhear state (see Fig. 4.4a). energy while contending APs communicate. Thus, the duration of time spent in the light-sleep state increases with increasing contention, ultimately offering substantial energy savings. As an aside, note that the client does not go back to the deep-sleep state because it needs to wake up soon for remaining traffic. Switching to and from deep-sleep will incur a high wake-up/shutdown cost, and the hardware is designed to avoid it whenever possible. Figure 4.12 shows results for the same experiment, but with a YouTube trace mea- sured using Tcpreplay. SleepWell gains are again substantial compared to 802.11 (in Figure 4.4(b)). Further, due to the bursty nature of the YouTube trace 4, clients spend more than 50% of the time in the PSM light-sleep mode even without con- tention. Under SleepWell with rising contention, the individual traffic bursts becomes systematically desynchronized. The overall proportion of time spent in light-sleep (instead of deep-sleep) only increases marginally. As shown, SleepWell clients using

4 YouTube clients buffer videos in smalls bursts to optimize for users that will not play the entire video, or will skip forward

110 YouTube 100% 90% 80% 70% High Power 60% Idle/Overhear 50% Light Sleep 40% Deep Sleep 30% 20% 10% 0% 1 AP 2 AP 3 AP 4 AP 5 AP 6 AP 7 AP 8 AP

Figure 4.12: Proportion of time spend in each activity level with YouTube traffic. Compare to Figure 4.4.

YouTube have nearly complete energy-immunity to at least 7 saturated links worth of traffic.

Performance gap from the case of zero-contention

Instead of categorizing power draw into different energy-states, Figure 4.13 directly compares 802.11 and SleepWell’s instantaneous power draw under 8-AP contention. The zero-contention scenario is used as the lower bound. Graphs show (a) TCP bulk transfer, (b) YouTube and (c) Pandora. Note that the SleepWell CDF remains closer to that of 1 AP (i.e., zero contention), except in the proportion of time spent in light versus deep-sleep.

Every Client Running YouTube

Figure 4.14 presents results from a scenario where all links download YouTube traffic. To model realistic environments as closely as possible, we used an extended-length YouTube trace (« 20 min), consisting of a user watching a series of movie trailers. The trace includes time spent selecting a series of video for playback, buffering, and watching the trailer, all in realistic proportions. As in all traces, we captured

111 Iperf, Bulk TCP Data Transfer 1

0.8 0.6 1 AP 0.4 802.11, 8 AP

Empirical CDF Empirical SleepWell, 8 AP 0.2

0 0 100 200 300 400 500 600 Power in Milliwatts (mW)

1 YouTube

0.8

0.6

0.4 1 AP Empirical CDF Empirical 0.2 802.11, 8 AP SleepWell, 8 AP 0 0 100 200 300 400 500 600 Power in Milliwatts (mW)

1 Pandora

0.8

0.6

0.4 1 AP Empirical CDF Empirical 0.2 802.11, 8 AP SleepWell, 8 AP 0 0 100 200 300 400 500 600 Power in Milliwatts (mW)

Figure 4.13: (a) Iperf, (b) YouTube, (c) Pandora. CDF comparison of instanta- neous power showing that SleepWell better matches the zero-contention curve. this trace from using the YouTube application on the Nexus One – this captured smartphone-specific buffering behaviors. We ran the trace in a loop with Tcpreplay on all AP/client pairs, varying the start time independently for each client. For this test, we used 9 phone clients distributed among 6 APs. The measured clients were associated to distinct APs, and were the only client for their respective APs. Due to the relatively low server-side bitrate (optimized to ensure that there is never too much buffered video) 9 YouTube clients were not sufficient to saturate the (high

112 YouTube w/ YouTube Contention 1

0.8

0.6

0.4 1 AP/client

Empirical CDF Empirical 0.2 802.11 - 6 AP, 9 client SleepWell - 6 AP, 9 client 0 0 100 200 300 400 500 Power in Milliwatts (mW)

Figure 4.14: CDF of instantaneous power consumption, YouTube with contention from YouTube clients. bitrate) wireless channel. Thus, we used 18 Mbps fixed bitrates, very much reflective of residential/public environments. Evidently, SleepWell consistently outperforms 802.11 PSM, although the margin is smaller due to small bursts of contention. We envisage the gain to increase if the number of APs increase or the link qualities degrade.

(4) Impact of link quality (bitrates)

Figure 4.15 shows energy performance of an 8 MB bulk data transfer with 4 contend- ing links at varying link bitrates. At low bitrates, all transmissions inflate in time, forcing 802.11 to spend considerably more time in the idle/overhear state. SleepWell does not incur this cost, as it sleeps through the long durations of neighboring traffic. Thus, relative gains grow as links degrade to 18Mbps and lower. At high bitrates, extra retries incur a net energy increase.

(5) Impact on throughput, latency

Figure 4.17 shows per-link TCP throughput in a 4-link topology. SleepWell’s perfor- mance is certainly comparable to 802.11. When clients are backlogged with traffic, recall that they would continue download until the start of another PSM burst. As a result, the channel remains well utilized. When the traffic is unsaturated in some

113 35 802.11, 4 AP 30 SleepWell, 4 AP 25 20 15 10 5 Total Energy in Joules (J) (J) Joules in Energy Total 0 11 18 24 36 48 54 65 Bitrate (Mbps)

Figure 4.15: Bulk data transfer on 4 AP/client testbed.

clients, the SleepWell traffic advertisements help in re-distributing the slack among backlogged clients. This minimizes wasteful gaps, allowing high channel utilization. To characterize scalability to large networks, we looked in the results of the Monte Carlo simulations. The goal was to observe the beacon placements, and compute the channel share that each client was getting. The per-client channel share directly relates to the throughput expected at that client. Figure 4.16 shows that after convergence SleepWell (a) provides a near-universal improvement to beacon spacing; (b) provides spacing improvements irrespective of network density; and (c) enables APs to satisfy a greater proportion of their traffic load. In contrast, random beacon placements lead to inequitable and inefficient distribution of the channel resources, leading to contention and wastage. Figure 4.18 presents per-packet latency under heavy contention, as measured through ICMP pings from the AP. After a PSM wakeup, SleepWell clients experience little contention from other links, and are able to receive and reply to probes faster than 802.11 clients. Further, the inflection point at 307ms (one listen interval) reflects that « 95% of SleepWell probes are received before the end of the timeslot following the probe. Even though 802.11 remains awake longer, the latency is still greater due to high network contention following beacons.

114 1 100 90 Random Beacons 0.8 80 SleepWell w/o Advertisements Random Beacons 70 SleepWell, Unbounded Traffic 0.6 60 50 SleepWell w/o 0.4 40 Advertisements Empirical CDF Empirical 30 0.2 SleepWell, 20 Unbounded Traffic 10

0 (TU) Median Separation Beacon 0 0 20 40 60 80 100 0 2 4 6 8 10 12 14 16 Beacon Spacing Number of Peers

1 Random Beacons 0.8 SleepWell w/o Advertisements SleepWell, Random Traffic 0.6

0.4 Empirical CDF Empirical 0.2

0 0 0.2 0.4 0.6 0.8 1 Satisfaction, Min(share / traffic, 1)

Figure 4.16: Performance of beacon adjustment: (a) CDF of beacon separation; (b) separation by network density; (c) CDF of proportion of an AP’s traffic that can be satisfied before the end of its beacon share.

(6) Impact on fairness

Figure 4.19(a) presents 4-link testbed results, showing that SleepWell is able to allo- cate throughput slightly more equitably than 802.11. Figure 4.19(b) presents Monte Carlo simulation results confirming that the beacon adjustment heuristic results in a more equitable spacing.

(7) Performance by deployment density

In Figures 4.20 and 4.21 we characterize how SleepWell performs at various deploy- ment densities. Figures 4.20 plots SleepWell AP performance as a function of the number of APs in a 1km x1km square (for a 50-50 mix of SleepWell and legacy APs).

115 1

0.8

0.6

0.4

Empirical CDF Empirical 802.11 0.2 SleepWell

0 0 0.5 1 1.5 2 2.5 Bandwidth (Mbps)

Figure 4.17: TCP throughput on 4 AP/client testbed. Distribution reflects per- link goodput for all links.

1

0.8

0.6 802.11 0.4 SleepWell Empirical CDF Empirical 0.2

0 0 200 400 600 800 1000 Latency (ms)

Figure 4.18: Per-packet latency on 8 AP/client testbed. Latency measured as 10 ICMP pings per second on one link, 7 others contend with TCP.

Figure 4.21 compares performance for 1000 APs at a varying percentage of legacy APs. For each, we show (a) the number of beacon adjustment rounds until conver- gence is reached at the 90th percentile; (b) the median beacon separation, indicative of the typical channel allocation received by an AP; and (c) beacon separation at the 5th percentile, reflective of near worst-case performance. Figure 4.20 shows that the SleepWell beacon adjustment heuristics (a) converge quickly, irrespective of density; (b) improve average-case beacon separation; and

116 1 1

0.8 0.8

0.6 802.11 0.6 SleepWell 0.4 0.4

Empirical CDF Empirical Random Beacons

0.2 Index Jain's Fairness 0.2 SleepWell w/o Advertisements SleepWell, Unbounded Traffic 0 0 0.9 0.92 0.94 0.96 0.98 1 0 2 4 6 8 10 12 14 16 Jain's Fairness Index Number of Peers

Figure 4.19: SleepWell fairness: (a) TCP Jain’s fairness on 4 AP/client testbed. Note X-intercept at 0.9.; (b) Jain’s fairness for simulated beacon shares with un- bounded traffic.

(c) ensures that even APs in the least-favorable conditions achieve a reasonable beacon separation. Figure 4.21 shows that (a) convergence is slowest with a 100% SleepWell deployment, but still within a small number of adjustment rounds; and (b,c) SleepWell APs receive a marginal increase in beacon separation with greater proportions of legacy APs (this is expected, as legacy APs will not make efforts to claim their fair share).

4.5 Limitations and Discussion

In this section, we discuss practical challenges for a SleepWell deployment.

4.5.1 Impact of Hidden Terminals

Hidden terminals complicate SleepWell, as much as it does 802.11. They may cause collisions, forcing a client to stay awake longer, and thereby, increasing the energy overhead. While hidden terminals are mostly mitigated by carefully tuning the carrier sense threshold and bitrates [34], SleepWell can adopt counter-measures to alleviate the problem. Specifically, since the hidden APs will also impose bursty traffic, a SleepWell AP may observe that its download packets are failing despite

117 3.5 120 Random Beacons 3 100 SleepWell w/o Advertisements 2.5 80 SleepWell, Unbounded Traffic 2 60 1.5 SleepWell w/o Advertisements 40 1 SleepWell, 20 0.5 Unbounded Traffic 90th% Convergence Rounds Rounds 90th%Convergence 0 (TU) Median Separation Beacon 0 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 APs per km^2 APs per km^2

45 40 Random Beacons 35 SleepWell w/o Advertisements 30 SleepWell, Unbounded Traffic 25 20 15 10 5 0 5th% Beacon Separation (TU) (TU) Separation Beacon 5th% 0 500 1000 1500 2000 2500 APs per km^2

Figure 4.20: SleepWell performance by AP density: (a) rounds until convergence at 90th percentile; (b) median beacon separation; (c) beacon separation at 5th per- centile. a high SNR to its client. The download SNR can be inferred from SNR of upload packets, coming back over a roughly symmetric link. At this point, the SleepWell AP can assume a “virtual beacon” on its traffic map and re-adjust its own beacon as per the protocol heuristic. In other words, the hidden terminal may be treated as another contending AP, only its beacon/traffic advertisements are indirectly inferred. If rare occasions present excessive number of hidden terminals, SleepWell may not be able to cope, and degenerate to 802.11.

118 12 35 SleepWell w/o 30 10 Advertisements 25 8 SleepWell, 20 6 Unbounded Traffic 15 4 10 Random Beacons SleepWell w/o Advertisements 2 5 SleepWell, Unbounded Traffic 90th% Convergence Rounds Rounds 90th%Convergence 0 (TU) Median Separation Beacon 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Proportion of Legacy APs Proportion of Legacy APs

14 12 10 8 Random Beacons 6 SleepWell w/o Advertisements 4 SleepWell, Unbounded Traffic 2 0 5th% Beacon Separation (TU) (TU) Separation Beacon 5th% 0 0.2 0.4 0.6 0.8 1 Proportion of Legacy APs

Figure 4.21: SleepWell performance by proportion of legacy APs: (a) rounds until convergence at 90th percentile; (b) median beacon separation; (c) beacon separation at 5th percentile.

4.5.2 Incremental Deployability

Thus far, our discussion has assumed all APs in the wireless vicinity to be running SleepWell. For practicality, SleepWell must be and is incrementally deployable; it is also able to coexist with legacy access points with fixed beacon schedules and no traffic preemption. SleepWell APs treat legacy APs identically for the purpose of beacon placement. Although the latter will not re-adjust to obtain their fair share of the beacon interval, they can still be expected to have bursty PSM traffic starting with a PSM beacon. Thus, the time period immediately following their beacon is best avoided. SleepWell includes these APs in the traffic maps, and computes the

119 expected share calculation assuming an advertised share of infinity. The system still converges from our Monte Carlo simulations, using 50% legacy APs.

4.5.3 Interactive Traffic

SleepWell is not indended for interactive, highly latency sensitive traffic (e.g., VoIP). PSM explictly forgoes support for low-latency operation for energy savings; SleepWell is subject to the same pitfalls.

4.5.4 TSF Adjustment

We believe our mechanism for adjusting the TSF clock (to migrate clients to a new beacon schedule) has no side effect. However, we cannot guarantee this to be universal among all devices.

4.6 Related Work

Substantial prior work has considered mechanisms to reduce the energy cost for mobile devices. In the interest of space, we sample a subset of them.

4.6.1 WiFi PSM sleep optimization

A number of solutions have considered augmenting PSM behaviors for improved ef- ficiency. [99, 16] propose client-side techniques for adaptive PSM, enabling clients to switch between PSM and fully-awake CAM modes as a function of traffic load. Catnap [58] exploits the discrepancy between wired and wireless bandwidth. [19] considers proxies to reduce the cost of application polling. [180] employs traffic shaping on TCP to make flows bursty, and thus more suitable for efficient PSM delivery. PM [108] leverages prediction to enable a wireless interface to sleep op- portunistically over short durations. Most closely aligned to SleepWell, NAPman [159] considers inter-client beacon staggering to improve the energy efficiency of mo- bile clients through reduced contention. SleepWell is complementary, extending the

120 core beacon staggering idea to the network. In conjunction with NAPman, the total energy gains can be higher.

4.6.2 WiFi Duty Cycling

A number of projects have considered duty cycling the WiFi radio into a deeper sleep state to avoid power drawn when idle. Wake-on-Wireless uses a secondary low-power radio interface for signaling traffic to reduce energy consumption while idle [169]. Cell2Notify uses cellular radios to forward notifications of incoming VoIP calls, wak- ing up the WiFi radio to receive the call just in time [10]. Context-for-Wireless predicts WiFi availability from nearby cell towers [150]. Blue-Fi correlates the pres- ence of nearby Bluetooth devices with WiFi availability [17]. Breadcrumbs pre- dicts the availability of WiFi from personal mobility profiles [143]. Turducken [176], CoolSpots [147], and Tailender [25] consider the use of heterogeneous radios for data transfer, only enabling the highest-powered WiFi radio when it is most appropriate for the traffic load. Each of these techniques enables a complete shutdown of the WiFI interface over long timescales. SleepWell is complementary in reducing energy consumption during those periods in which the WiFi interfaces are enabled and in active use.

4.6.3 Sensor network TDMA

Scheduled channel access has often been considered for energy savings in sensor networks. S-MAC enables nodes to synchronize sleep schedules with their peers, and accordingly sleep through the peers’ transmissions [189]. Z-MAC multiplexes CSMA and TDMA [156], and partly achieves the best of TDMA and CSMA. SleepWell bears resemblance to these high level ideas, however, the system is designed in response to a set of completely different challenges and constraints.

121 4.7 Conclusion

We summarize SleepWell with an analogy. Big cities in the US and other countries face heavy rush hours due to masses of people commuting to office. If work times were flexible, different companies could potentially stagger their office hours to reduce this rush. Reduced rush would open up more free time for all, and yet, the total working hours can remain unaffected. This intuition underlies the design of SleepWell. Given that Internet traffic can tolerate a reasonable amount of latency/flexibility, SleepWell APs adjust their activity cycles to minimally overlap with others. Each client frees up time to sleep, ultimately resulting in promising energy gains with practically negligible loss in performance. Our testbed implementation and thorough evaluation gives us confidence that SleepWell is actually viable, and hence, worth considering as a revision to current 802.11 PSM.

122 5 A Matchmaking System for Multiplayer Mobile Games

Supporting interactive, multiplayer games on mobile phones over cellular networks is a difficult problem. It is particularly relevant now with the explosion of mostly single- player or turn-based games on mobile phones. The challenges stem from the highly variable performance of cellular networks and the need for scalability (not burdening the cellular infrastructure, nor any server resources that a game developer deploys). We have built a service for matchmaking in mobile games – assigning players to games such that game settings are satisfied as well as latency requirements for an enjoyable game. This requires solving two problems. First, the service needs to know the cellular network latency between game players. Second, the service needs to quickly group players into viable game sessions. In this chapter, we present the design of our service, results from our experiments on predicting cellular latency, and results from efficiently grouping players into games.

123 5.1 Introduction

Games have become very popular on mobile phones. The iPhone app store has over 300,000 applications as of October 2010, roughly 20% of which are games [135], and yet 80% of application downloads are games [66]. On the Windows Phone 7 platform, the top applications are games, all the way down to number 29 (as of 10 December 2010). Despite this popularity, mobile gaming is still in its infancy. The vast majority of mobile games are either single player, turn-based (latency-insensitive games such as card games or strategy games), or multiplayer only over Bluetooth or Wi-Fi. We believe that interactive multiplayer games, such as FPS (first-person shooter) or racing games, that work over cellular data are just around the corner. While a few are currently available, for them to become numerous and successful, there are many mobile systems challenges to overcome. In a very recent interview [177], John Carmack, co-founder of id Software, said (in the context of FPS games and 3G): “multiplayer in some form is where the breakthrough, platform-defining things are going to be in the mobile space”. One urgent challenge is managing the highly variable network performance that applications experience over 3G cellular [83]. Already a difficult problem for multiplayer games on home broadband connections [9], a player with poor network performance can destroy the experience of others in the same networked game. The key to solving this problem is effective matchmaking. Players should be grouped into games where each player’s network performance meets the needs of the game, and the size of the group is as large as possible within the limits of the game’s architecture. For matchmaking to be effective, it must solve two problems. First, the network performance between (potentially many) game players needs to be estimated. This estimation should be done quickly so that impatient gamers are not left waiting, and in a scalable way so as not to overburden cellular network links

124 nor expensive server bandwidth. Second, players need to be grouped together into games based on their network performance and desired game characteristics (e.g., game topology or size). This can be difficult if there are many players. A particularly challenging type of matchmaking is that for P2P games. In such games, the game developer is not burdened with the expensive task of maintain- ing high-powered and high-bandwidth servers in many locations across the planet. Instead, individual player devices are matchmaked into different game sessions and they exchange game state among themselves. This is a very popular architecture for multiplayer gaming – LIVE supports game sessions in this way that are measured in the 100s of millions each month [9]. In this work, we address the problem of matchmaking for P2P multiplayer games over cellular data networks. Today, a major US cellular carrier charges an additional $3 per month for a public IP address and unrestricted inbound traffic to a phone. With this option, we have been able to communicate directly between phones over 3G without going through a server on the Internet. We take the controversial stance that soon, most cellular data plans will include this feature by default and there will be many such P2P applications on phones. Even though we address matchmaking for P2P games, our system and contributions are also applicable to the traditional server-based game matchmaking problem. As far as we know, this is the first work to address the matchmaking problem for multiplayer mobile games over cellular networks. Specifically, our contributions include:

• We show that not only is phone-to-phone traffic feasible over cellular networks, it reduces latency compared to via an Internet server.

• Despite the difficulty that prior work [23, 83] implies, we show that it is actu- ally possible to estimate or predict the latency that a phone will have. We do

125 so based on the experience of other phones and information about the cellular connection that is available to the phone. Our goal is not to identify all such predictors – that is a moving target with rapidly evolving cellular networks. Rather, our goal is to show that such predictors do exist and can be easily de- termined automatically without requiring detailed and proprietary information from cellular networks.

• We show how, using such latency estimation, we can significantly reduce the burden on individual phones and cellular networks for effective matchmaking.

• We design and implement Switchboard, a matchmaking system for mobile games that is scalable not only in the measurement overhead but also in group- ing players together quickly even when there are tens of thousands of players.

5.2 Motivation and Prior Work

5.2.1 Latency in Multiplayer Games

Multiplayer gaming has gone past the confines of a single console or set of consoles on the same LAN to working across the Internet. This has come at the cost of additional latency. Studies have established that user behavior and performance in games can change significantly with 50ms-2000ms of additional latency, depending on the type of game [49]. Some games use high-precision objects (e.g., rifles and machine guns) and first-person perspectives which tighten the latency bounds that a player can tolerate. As a result, researchers and game developers have built several techniques for hiding latency in networked games. Not surprisingly, these techniques rely on manipulating how time is perceived or handled by different components of the game. Some games advance time in lockstep [28] or with event-locking. A player can advance to the next time quantum only when all other players (or all other players

126 that this player is directly interacting with) are also ready to advance. So if any player experiences occasional high network delay, the lockstep protocol will ensure that everyone proceeds at the (slowest) pace so that there is no inconsistency in the distributed game state. Some games use dead reckoning [31] to predict the future positions of players or objects (such as bullets). So if due to network delay, I do not receive a new position update from a remote player or object, I can use the last few updates to plot a trajectory and speed and guess the current position. If the packet arrives later and the position calculation does not match, the software will have to reconcile inconsistent game state [41], which often appears to the player as a “glitch in the matrix” – an object that suddenly jumps from one spot to another. Some games use simple linear trajectory calculations, while others calculate more complex angular velocities and use human movement models. These techniques are effective and commonly used for hiding network jitter. That is, if the additional network delay is occasional, the player may not notice the side effects of these techniques. However, if the typical network latency of one (or more) player(s) is high, then the experience for all players suffers because games in lockstep will progress very slowly, or there will be many consistency corrections with dead reckoning.

5.2.2 Matchmaking in Online Games

To reduce the impact of players with persistently high latency, many online games use some form of matchmaking to setup each game session. When a player launches a game and selects online gameplay, the game will typically make the player wait in a virtual “matchmaking lobby”. While in this lobby, game clients connect to a matchmaking service that maintains a current list of servers with game sessions that are waiting for players to join [65]. At any point in time, there may be many game

127 servers available for hosting a game. Clients will estimate their latency to each of these servers, and join one that they have low latency to. While there are many online games that have servers on the Internet, there are major costs associated with maintaining these servers. Servers across the planet are needed to provide low latency games to players in different geographic regions. Each such location may need many servers to host the large numbers of game sessions that popular games experience. The traffic consumed by such hosting can be enormous, especially considering that FPS games frequently exchange packets and can last for as long as 7-19 minutes [74]. A popular alternative is to leverage the compute power of large numbers of game consoles and PCs on the Internet. Some P2P games use a star-hub topology, where one host player serves as the central point of coordination and all the client players exchange game state updates through the host. Hosts can be selected based on their network connectivity and past success in hosting games. Such games have similar communication patterns as client-server games, except that a game player replaces an Internet server as the hub. Another commonly-used topology is the clique, where one player can directly communicate with any other player in the group. It avoids some shortcomings of the star-hub topology, namely the single point of failure and performance bottleneck. However, it is more challenging for the game developer to maintain consistency among players. Microsoft Xbox LIVE is a very popular platform for P2P games and matchmakes games using the star-hub topology – it has over 23 million users [186] and the number of online P2P game sessions for individual popular game titles are measured in the 100s of millions per month [9], with roughly 16 players in each game session.

128 1 AT&T to AT&T, Direct 0.8 AT&T to AT&T, via Bing AT&T to AT&T, via UW 0.6

0.4 empirical CDF empirical

0.2

0 0 100 200 300 400 500 600 RTT (in ms) Figure 5.1: CDF of ping latency between two phones on 3G HSDPA connectivity in Redmond, WA, either direct, or via a nearby University of Washington server, or via the best server offered by geo-distributed Bing Search. Horizontal axis cropped at 600ms.

5.2.3 P2P Games Over Cellular Networks

Inspired by the popularity of P2P online games, we believe that low latency, multi- player games over cellular data networks are better enabled through P2P (peer-to- peer or phone-to-phone, take your pick). Mobile platforms have a large number of games today, written by a variety of developers. These platforms, such as Windows Phone 7, iPhone, Android, are used in many regions of the world. Not all game developers can afford to host servers everywhere, pay for traffic, and manage them. In addition to the cost benefit, the latency benefit of P2P is significant. In Figures 5.1 and 5.2, we show the latency between two phones, either directly or by summing up the individual latencies from each phone to a mutual server. In this fashion, we can compare four strategies – P2P, using a single server, using two servers, and using many geo-distributed servers. A hosting strategy with modest cost would be to have a single server in one location, for example at the University of Washington. This strategy is 139ms to 148ms worse than P2P at the 50th percentile. A more expensive strategy would be to host servers at both locations in which we conducted experiments and use

129 1

0.8

0.6

0.4 AT&T to AT&T Direct empirical CDF empirical AT&T to AT&T via Bing 0.2 AT&T to AT&T via Duke AT&T to AT&T via UW 0 0 100 200 300 400 500 600 RTT (in ms) Figure 5.2: CDF of ping latency between two phones on 3G HSDPA connectivity in Durham, NC, either direct, or via a nearby Duke University server, or via a distant University of Washington server, or via the best server offered by geo-distributed Bing Search. Horizontal axis cropped at 600ms. the closer one, either University of Washington or Duke University – the penalty is 47ms to 148ms. The most expensive strategy is to host servers in many datacenters with direct peering to many ISPs and use third-party DNS redirection services that optimize for latency, such as a large search engine like Bing – this strategy is 27ms to 69ms worse than P2P. Depending on the type of game [49], these latency penalties can deteriorate the game experience. Hence, we believe P2P is an attractive model for mobile games as well.

5.2.4 Cellular Network Performance

As is apparent from Figures 5.1 and 5.2, phones in different parts of the same mobile network can experience very different latencies. One of the most important aspects of matchmaking is knowing the latency that each player will have to each potential peer. While this has been studied for consoles on the Internet [9], there are several open questions in the mobile context. Should each phone ping all the available peers to estimate this latency? For how long should it ping? How often do these latency estimates need to be updated? How will this scale to a popular game with many

130 players? Predicting future latencies based on past latencies or other information about the network can be used to reduce the overhead of such measurements. Recent work has characterized the performance of 3G [181], and the performance of TCP flows [55] and applications [110, 83] over 3G. This work has shed light on different applications experiencing different network performance and improvements to TCP throughput. CDNs select low-latency servers by typically geo-locating the client (or LDNS server) IP address. However, recent work [23] on 3G phones shows this will not work in the cellular domain. They note that different cities have different latency distributions, but with the caveat that the measurements were to a single server, and time variation was not factored out. That work motivated us to explore this problem in more depth and understand whether it is even possible to predict cellular network latency. For P2P communication to work over cellular data networks, phones in a game session have to be able to receive inbound traffic connections. While on some mobile networks in the US this is not currently possible, AT&T Wireless provides a public IP address to a phone and unrestricted inbound traffic for an extra US$3 a month [20]. Sprint offers the same feature for free. We believe that once compelling applications such as fast, multiplayer games become popular, this will be the default behavior.

5.2.5 Grouping

Once the latencies between players are measured or predicted, the remaining chal- lenge in matchmaking is to group them into viable game sessions. Each session should have only those players that have latencies to each other within the tolerance of the game. This tolerance may be specified, for instance, as a 90th percentile latency that must be below a certain threshold. Even though 10% of the time higher latencies may be experienced, those might be corrected with software techniques such as dead reckoning. Each session should be packed with as many viable players as the game

131 allows (just a single player in each session is an easy solution but rather boring for the player).1 Ideally, a single matchmaking system should accommodate different types of P2P topologies that game developers may use, such as clique and star-hub. Creating such groups under latency constraints while maximizing group sizes is related to the min- imum clique partition problem in graph theory. If we treat each player as a node and connect two nodes with an edge if their latency is below the developer’s constraint, we can cast the grouping problem as partitioning the graph into cliques with the ob- jective of minimizing the number of cliques under the clique size constraint. Finding the minimum clique partition of a graph is NP hard [64]. Polynomial time approxi- mation algorithms for this problem exist only for certain graph classes [93]. Gamers are rather impatient and would prefer not to spend much time in the matchmaking lobby waiting for a match to happen. Grouping should run fast and scale to a large number of players in case a game becomes popular. This grouping problem for P2P games is markedly different from that for client- server games. In client-server games, each player typically picks a game server based on a combination of server load, client-server latency, and number of players on a server [65]. This selection does not take into account the latency of other players who have picked that server, and the player may still experience poor gameplay if other players have chosen poorly.

5.3 Estimating Cellular Latency

Each multiplayer game will have its own tolerance for latency between players. For instance, a fast-paced, action-packed shooter game may require that for 90% of traffic,

1 After a game session has been formed and gameplay has begun, some games still allow new users to join the session. This type of matchmaking is easier given that for each new player, the choice is among a (much smaller) number of ongoing game sessions. In this chapter, we focus on the original, harder problem of grouping for new game sessions.

132 the latency should be under 150ms, while a different game may tolerate 250ms at the 90th percentile because it uses bows-and-arrows or uses very sophisticated dead reckoning. If the matchmaking service knows in advance what latency each player will experience to each of the potential peers for the duration of a future game session, it can appropriately assign players to each other. Due to lack of information from the future, we need to predict the future latency based on information currently available. We now present our findings from several measurements we have conducted to shed light on how we can predict future latency on 3G networks. Our measurements have been taken over multiple days, in each of several locations: Princeville (Kauai, HI), Redmond (WA), Seattle (WA), Los Angeles (CA), Durham (NC). In almost all cases, each graph that we present is visually similar to those from other locations and days. When they are not similar, we present the dis-similar graphs as well for comparison. Our measurements were conducted primarily using a pair of HTC Fuze phones running Windows Mobile 6.5 on AT&T Wireless, however we also have mea- surements from a pair of Google Nexus One phones running Android 2.2 on T-Mobile in Durham. Except when explicitly indicated, we restrict all our measurements in this chapter to the HSDPA version of 3G, and only consider measurements after the cellular radio has achieved the “full power” DCH mode. Except when explicitly indicated, in each experiment the phones were stationary. We use the term FRH throughout the chapter – it is the “First Responding Hop” – that is, when a traceroute is attempted from a phone to any destination on the Internet, it is the first hop beyond the phone to respond with ICMP TTL Expired packets (typically at a TTL of 2). Based on our measurement experience and textbook understanding of HSDPA 3G cellular networks, we believe this is the GGSN [82]. This device is deep in the mobile operator’s network, and all IP traffic to/from the phone traverses this device, as shown in Figure 5.3. When considering

133

310 290 270 250 230 210

median latency (ms) (ms) medianlatency 190 170 150 9:36 10:04 10:33 11:02 11:31 12:00 time of day (AM) Figure 5.4: RTT from a phone in Princeville, HI on AT&T Wireless to the FRH. Each point is the median latency over 15 seconds. Graph is zoomed into a portion of the data to show detail. Data from Redmond, Seattle, Durham, and Los Angeles are visually similar.

In Figure 5.4, we show the latency that a phone experiences over a short duration of time. As the figure shows, there is a significant amount of latency variation, and at first glance, it does not appear that a 15 second window of measurements is very predictive of future latencies.

Over what timescale is 3G latency predictable?

If we pick too short a window of time over which to do latency measurements (e.g. 15 seconds), those measurements do not fully capture the variability of this connectivity and hence are not predictive of future latency. If we pick too long a time window, it may capture longer term drift in network characteristics, and require a larger measurement overhead. Thus we now vary the time window over which we compute a latency distribution, and examine how similar that latency distribution is to the next time window. In Figure 5.5, we show the mean time window-to-window change in latency to the FRH, which shows a dip around 15 minutes. Across different time durations, this is the duration where one measurement window is most similar to the next win-

135 35 95th 30 90th 50th 25 25th 20

15

10

5 mean sequential difference ms) (in difference mean sequential 0 0 50 100 150 200 250 interval size (in minutes)

Figure 5.5: RTT from a phone in Redmond, WA on AT&T Wireless to the FRH. On the horizontal axis, we vary the length of time window over which we calculate the latency at the various percentiles indicated by the different lines. On the vertical axis, we show the difference in ms between two consecutive time windows at the different percentiles, averaged over the entire trace. Data from Princeville, Seattle, Durham, Los Angeles for AT&T Wireless are visually similar.

18 95th 16 90th 14 50th 12 25th 10 8 6 4 2 mean sequential difference ms) (in difference mean sequential 0 0 50 100 150 200 250 interval size (in minutes)

Figure 5.6: RTT from a phone in Durham, NC on T-Mobile to the FRH. On the horizontal axis, we vary the length of time window over which we calculate the latency at the various percentiles indicated by the different lines. On the vertical axis, we show the difference in ms between two consecutive time windows at the different percentiles, averaged over the entire trace. dow. Note that across our different measurements, this analysis for the T-Mobile network in Durham exhibited slightly different behavior, and hence we present Fig- ure 5.6. However again 15 minute time windows are most predictable of the next

136 1 0.9 0.8 0.7 0.6 0.5 0.4 1 previous empirical CDF empirical 0.3 2 previous 0.2 4 previous 0.1 7 previous 0 0 2 4 6 8 10 12 14 16 18 20 RTT difference at 95th percentile (in ms)

Figure 5.7: For any given 15 minute time window, from how far back in time can we use latency measurements and still be accurate? The horizontal axis shows the difference in latency at the 95th percentile between a time window and a previous time window. The age of the previous time window is shown in the legend. The vertical axis shows the CDF across all the different 15 minute intervals in this trace. The horizontal axis is clipped on the right. for this different network (using similar HSDPA 3G technology but at different radio frequencies). For a more rigorous statistical analysis, we include Figure 5.8, which confirms the highest stability for the 15 minute duration.

For how many future time windows is one window predictive of?

We have empirically established that latency measurements from a 15 minute time window are fairly predictive of the immediately subsequent time window. In Fig- ure 5.7, we consider how rapidly this predictive power degrades over successive 15 minute time windows. If a game developer is concerned about the 95th percentile of latency that players experience, we see that using measurements that are 15 minutes stale give an error of under 17ms for 90% of the time. If we reach back to measure- ments from 105 minutes ago (the “7 previous” line), this error increases to 29ms. For brevity we do not show graphs of other percentiles, but for instance, at the 50th percentile the errors are 8ms and 12ms respectively. For the remainder of this chap- ter and in the design of Switchboard, we use measurements that are stale only by 1

137 sig=99 sig=95 sig=90 240 minutes 1 120 minutes 0.9 60 minutes 0.8 2 minutes 0.7 15 minutes 0.6 0.5 0.4

0.3 CDF empirical 0.2 0.1 0 0.00001 0.0001 0.001 0.01 0.1 1 KS Test P-Value (inter-interval instability where p < , (log scale)

Figure 5.8: CDF of Kolmogorov-Smirnov (KS) test Goodness-of-Fit P-values for successive time windows by window size (in minutes) for a phone in Redmond, WA on AT&T Wireless. Each data point represents a two-sample KS test using 100 points from each of two successive time windows. The percentage of null hypothesis rejection is shown as the intersection of a distribution with the chosen significance level. A lower percentage of rejected null hypotheses is an indication of greater stability across successive time windows. The horizontal axis is clipped on the left. For clarity, a limited set of window sizes are shown. Data from Princeville, Seattle, Durham, Los Angeles are visually similar. time window to minimize prediction error. However, the measurement overhead of our system can be improved by allowing older, less accurate predictions.

How many measurements are needed in each time window?

The results we have presented so far have been generated from latency measurements at the rate of once per 1 second. For a 15 minute window, 900 measurements can be a significant overhead for a phone, both on the battery and on a usage-based pricing plan. In Figure 5.9, we consider by how much we can slow down this measurement rate while still obtaining a latency distribution that is similar. If the sampling rate is once per 15 seconds, there is relatively little degradation in the latency distribu- tion. There is less than 11ms difference at the 50th percentile for all of the latency distributions for every 15 minute window in the trace, between sampling at once per

138 1 second and once per 15 seconds. For the 95th percentile, for more than half of the time windows, the difference in latency is only 11ms. We believe that sending and receiving 60 packets over the course of 15 minutes is a reasonable trade-off between expending limited resources on the phone and measurement accuracy.

5.3.2 Using One Phone to Predict the Future Latency of a Different Phone

So far, our experiments have found that a phone needs to measure its latency about 60 times in a 15 minute window to predict its latency in future 15 minute windows. While this significantly improves accuracy compared to naively using a few ping measurements, and significantly reduces overhead compared to naively pinging con-

1 1 0.9 0.9 0.8 0.8 0.7 5s 0.7 5s 0.6 15s 0.6 15s 0.5 60s 0.5 45s 120s 60s 0.4 0.4 300s 90s empirical CDFempirical 0.3 600s CDFempirical 0.3 0.2 0.2 0.1 0.1 0 0 0 10 20 30 40 50 0 10 20 30 40 50 RTT difference at 50th percentile (in ms) RTT difference at 90th percentile (in ms)

1 0.9 0.8 0.7 5s 0.6 15s 0.5 45s 60s 0.4

empirical CDFempirical 0.3 0.2 0.1 0 0 10 20 30 40 50 RTT difference at 95th percentile (in ms)

Figure 5.9: Impact of reducing the measurement sampling rate for a 15 minute window of latency from a phone in Durham, NC on AT&T Wireless to the FRH. The horizontal axis shows the difference in latency at the specified percentile between using a measurement rate of once per 1 second and using a measurement rate of once per 5 to 600 seconds as indicated. The vertical axis shows the CDF across all the different 15 minute intervals in this trace. Note that at a sampling rate of once per 90 seconds for 15 minutes, we have only 10 samples and hence we cannot calculate the 95th percentile. The horizontal axis is clipped on the right. Data from Princeville, Redmond, Seattle, Los Angeles are visually similar.

139 tinuously for several minutes, there is still a large network overhead to every phone measuring its own latency when multiplayer mobile gaming becomes popular. We now consider the extent to which one phone’s latency is representative of another phone’s, and hence reduce the burden on the network by sharing recent measurements with other phones. As we showed in Figures 5.1 and 5.2, a phone in Redmond has very different latency to one in Durham, and hence we need to determine what parameters of connectivity that if are the same between two phones also mean that they share similar latency.

Connectivity Parameters with Little Influence

We have conducted many experiments in several locations to explore how much different connectivity parameters influence a cellphone’s latency. Due to lack of space and for conciseness, we now briefly summarize our negative findings before presenting our positive findings in more detail. We know from prior work [23] that the public IP address a phone is assigned can vary and does not correlate with its location. Using similar data to that prior work, we reconfirmed this finding and hence do no believe that the public IP address is representative of a phone’s latency. In fact, for Figures 5.1 and 5.2, each phone had the same IP address regardless of which location it was in. Our experiments show no discernible correlation between 3G signal strength at the phone and its latency to the FRH. While initially counter-intuitive, this obser- vation is borne out by our “textbook” understanding of modern cellular standards. Unlike earlier 3G standards [36], the HSDPA standard provides a reliable wireless link to the cellular phone by performing retransmissions at the L1 physical layer be- tween the phone and the celltower [82]. Poor signal strength should result in higher BLER (block error rates). However, in general, power control and adaptive mod- ulation keep the channel fast and efficient at a variety of signal strengths. BLER

140 appears to be a concave function of SNR [82], and hence the signal strength has to be extremely low before a bad BLER of above 10% is experienced. Furthermore, the use of dedicated channels, short TTI (transmission time interval) of 2ms, and explicit ACK / NACK mechanisms mean that retransmitting corrupted blocks is extremely fast (in the order of a few ms). We have also conducted experiments at a variety of speeds while driving in city streets and highways. After accounting for celltower changes, we see little correlation between speed and latency, though unfortunately the highway patrol did not let us conduct experiments much beyond 60mph. Our experiments do show a long term trend in latency variation that suggests a diurnal effect, that we suspect is due to human-induced load on the network. However, the same time window from one weekday is not very predictive of the same window for the next weekday or the same day the following week. All of the results we present in this chapter are specific to HSDPA connectivity. Earlier versions of 3G do exhibit different latency distributions, and especially 2G technologies GPRS and EDGE which are dramatically different. We do not explore these older technologies further in this chapter.

Phones Under the Same Celltower

When a phone is connected to a cellular network, certain parameters of that connec- tivity are exposed to the mobile OS – CID, LAC, MNC, MCC. The CID (Celltower ID) is a number that uniquely identifies the celltower. The LAC (Location Area Code) is a logical grouping of celltowers that share signaling for locating a phone (a phone that roams between celltowers with the same LAC does not need to re-update the HLR and VLR location registers). The MNC and MCC numbers uniquely iden- tify the mobile operator (e.g., AT&T Wireless or T-Mobile).

141

1 1 0.9 0.9 0.8 0.8 0.7 Latona 0.7 0.6 U Village 0.6 0.5 S-home 0.5 S-home Latona 0.4 Herkimer 0.4 empirical CDFempirical empirical CDFempirical U Village 0.3 1st Ave 0.3 Northgate Herkimer 0.2 0.2 Northgate 0.1 0.1 1st Ave 0 0 0 20 40 60 80 100 120 0 50 100 150 200 RTT difference at 50th percentile (in ms) RTT difference at 90th percentile (in ms)

1 0.9 0.8 0.7 0.6 0.5 Latona 0.4 U Village

empirical CDFempirical 0.3 S-home Northgate 0.2 1st Ave 0.1 Herkimer 0 0 100 200 300 400 500 600 RTT difference at 95th percentile (in ms) Figure 5.11: Difference in latency between a stationary phone at “S-home” and a phone placed at a variety of locations in Seattle. Each line is a CDF of ((xth percentile latency over a 15-minute interval from stationary phone at “S-home”) - (xth percentile latency over the same 15-minute interval for the other phone at the location in the legend)) computed for all possible 15-minute windows, in 1 minute increments. The xth percentile is 50th for the top graph, 90th for the middle, and 95th for the bottom. Horizontal axis is cropped on the right.

Durham. Figure 5.10 shows maps of each of these locations and Table 5.1 shows the CID and LAC numbers for the celltowers that the phones connected to. Figure 5.11 shows the results of the Seattle experiment. The “S-home” lines show the difference in latency when both phones were placed next to each other, which is about 30ms in most instances. The “Latona” lines show the difference when both phones were connected to the same celltower but one was further away. These lines are almost indistinguishable from the “S-home” lines. The other locations have different CIDs from “S-home”. Some of these locations have very different latency to the stationary phone at “S-home”, while some are similar. We see similar behavior with experiments in other locations. In Figure 5.12, the locations with the same CID as the stationary phone experience similar latency. Of the ones with different CIDs, one has very different latency (“REI”) while another has very similar latency

143 1 0.9 0.8 0.7 0.6 QFC J-home 0.5 M-home 0.4 Mailbox empirical CDF empirical 0.3 H-home 0.2 REI 0.1 0 0 5 10 15 20 25 30 35 40 RTT difference at 50th percentile (in ms) Figure 5.12: Difference in latency between a stationary phone at “M-home” and a phone placed at a variety of locations in Redmond. Each line is a CDF of ((xth percentile latency over a 15-minute interval from stationary phone at “M-home”) - (xth percentile latency over the same 15-minute interval for the other phone at the location in the legend)) computed for all possible 15-minute windows, in 1 minute increments. For conciseness, we present only the 50th percentile graph. Horizontal axis is cropped on the right.

1 0.9 0.8 0.7 0.6 Large Retailer 0.5 Durham Mall 0.4 R-home

empirical CDF empirical 0.3 Raleigh Mall 0.2 Breakfast Rest. 0.1 J-Home 0 0 5 10 15 20 25 30 RTT difference at 50th percentile (in ms)

Figure 5.13: Difference in latency between a stationary phone at “R-home” and a phone placed at a variety of locations in Durham. Each line is a CDF of ((xth percentile latency over a 15-minute interval from stationary phone at “R-home”) - (xth percentile latency over the same 15-minute interval for the other phone at the location in the legend)) computed for all possible 15-minute windows, in 1 minute increments. For conciseness, we present only the 50th percentile graph. Horizontal axis is cropped on the right.

(“QFC”). With Durham in Figure 5.13, the two locations with the same CID (“R- home” and “Breakfast Rest.” have similar latency, while most of the other CIDs are

144 different. We have seen similar behavior across different days (both on weekdays and weekends) and several other locations in each of these areas, but we do not enumerate those experiments here for conciseness. From these experiments, we believe that phones under the same RNC (see Fig- ure 5.3) experience similar latency. The RNC is a physical grouping of celltowers, where the RNC controls radio resource allocation for the celltowers under it. We believe that latency depends a large part on congestion and provisioned capacity, which varies from RNC to RNC, and this theory is also suggested by prior work based on 3G measurements in Hong Kong [181]. Unfortunately, the identity of the RNC is not exposed to the OS on the phone, as far as we know. While the LAC identity is exposed, LAC is a logical grouping having to do with signaling which has little impact on latency once a phone has initiated a data connection. Not knowing which RNC a celltower is part of, we use the more conservative approach of sharing latency profiles between phones connected to the same CID only. There will be phones under other celltowers with similar latency as our experiments show, but we are unable to reliably identify them.

5.3.3 Predicting the Latency Between Phones

So far, the results we have presented have been intentionally limited to predicting the latency between a phone and its FRH. We have found the ideal duration of time over which measurements need to be taken, how many measurements are needed, and among which phones these measurements can be shared to reduce overhead. However, for P2P multiplayer gaming, we need to predict the end-to-end latency between pairs (or more) of phones. From traceroutes we have done between phones in the same location and across many different locations in the US, the end-to-end latency appears to be the sum of the component latencies – phone1 to FRH1, FRH1

145 1 0.9 0.8 0.7 0.6 Durham FRH to San Antonio FRH 0.5 Durham phone to FRH 0.4 San Antonio phone to FRH empirical CDF empirical 0.3 phone to phone 0.2 0.1 0 0 50 100 150 200 250 300 350 400 RTT (ms)

Figure 5.14: CDF of RTT between a phone in Durham, NC and a phone in San Antonio, TX. Component latencies involving the respective FRH are also included. The FRH to FRH latency is calculated by the difference of pings. The horizontal axis is clipped on the right. Note that the phone-to-phone CDF is not a perfect sum of the other three CDFs due to small variations in latency in between traceroute packets issued at the rate of once per second.

to FRH2, FRH2 to phone2. This is obviously expected behavior, and also shown in Figure 5.14. The remaining task is to predict the latency between a pair of FRH. This is a very traditional problem of scalable prediction of the latency between two points on the Internet. We can use prior techniques such as Vivaldi [53], Pyxida [100], or Htrae [9], to name just a few. These techniques work well if the latency does not vary tremendously over short time scales, which is true of many types of wired Internet connectivity. In Figure 5.14, we see that the left-most line, which is the FRH to FRH latency, is fairly straight in comparison. Hence, we rely on the demonstrated effectiveness of prior work to solve this problem and do not discuss it in more depth here. In the next section, we present the design of Switchboard and describe how we use our findings on 3G latency to improve the scalability of matchmaking.

146 5.4 Switchboard

The goal of a matchmaking service is to abstract away the problem of assigning players to game sessions so that each game developer does not have to independently solve this. A successful game session is one in which the latencies experienced by every player meet the requirements of the game and the number of players is as large as possible for the game. In using the matchmaking service, the game developer specifies the latency requirements of the game, and the number of players it can support. The matchmaking service has to operate in a scalable manner. The amount of measurement overhead for each player and on the network in general has to be as small as possible. This is especially true of games on phones, where the phone has limited energy and the network has relatively limited capacity. The amount of time that a player spends in the matchmaking lobby has to be minimal as well, and should not grow significantly as a game becomes popular and more users participate in matchmaking. We now briefly describe the design of Switchboard, and in particular point out how it scales while trying to achieve low latency but large matches.

5.4.1 Architecture of Switchboard

As Figure 5.15 shows, there are two sets of components to Switchboard – components on the clients and components on the centralized, cloud-based service. The Switchboard client functionality is split into two parts. The developer’s game code interfaces with the Switchboard Lobby Browser – this component interacts with the Lobby Service running in the cloud. The API is described next in Section 5.4.2. The other part of the Switchboard client produces network measurements for the cloud service to consume, and this is described in Section 5.4.3. This part of the client does not directly interact with the developer’s game code.

147

Figure 5.16 summarizes the API that the game developer uses to interact with the Lobby Browser component of Switchboard. The game has to implement and instantiate a derived class of MatchmakingClient. It instantiates a derived class of LobbyBrowser by specifying the game level that the player has selected, and the callback function that tells the game client that matchmaking has been done. The LobbyBrowser interfaces with the Lobby Service in the cloud, but this interaction or API is hidden from the game developer. The LobbyBrowser just needs to instanti- ate a CloudInterface, which specifies the hash for this lobby, the latency percentile of interest, the limit for this percentile, the maximum number of players, and the callback function. In Switchboard, we uniquely identify each matchmaking lobby with a hash of the combination of the game’s name and the map or level. Group- ing is conducted independently for each unique hash. The latencyPercentile of

class BoomLobby : LobbyBrowser { private CloudInterface myInterface; private BoomClient myClient; private StartGameCallback myClientStart; private int latencyPercentile = 95; private int latencyLimit = 250; private int maxPlayers = 16; public void BoomLobby(string gameLevel) { string Hash = "Boom" + gameLevel; myInterface = new CloudInterface(Hash, latencyPercentile, latencyLimit, maxPlayers, new MatchReadyCallback(this.MatchReady)); } public void Join(BoomClient client, StartGameCallback sg) { myClient = client; myClientStart = sg; myInterface.AddClient(client); } public void MatchReady(BoomClient[] clients) { myClientStart(clients); } } class BoomClient : MatchmakingClient { public void BoomClient(string gameLevel) { BoomLobby myLobby = new BoomLobby(gameLevel); myLobby.Join(this, new StartGameCallback(this.StartBoomGameNow)); } public void StartBoomGameNow(BoomClient[] players) { // ... Do game stuff here } }

Figure 5.16: C# client API of Switchboard as would be used in a hypothetical game called “Boom”. For brevity, base class definitions are not shown here.

149 the latency distribution between any two MatchmakingClients should be less than latencyLimit. We have provided a simple example in the figure, where the developer wants the 95th percentile to be under 250ms. On the cloud service side, the Lobby Service will interact with the Grouping Agent on the client’s behalf. When the Grouping Agent returns with a list of matches, the Lobby Service will hand to every client the list of other clients it has been matched with. A client may wish to re-join the lobby if a null list is returned (because there are no other players at this time, or no others with low enough latency).

5.4.3 Latency Estimator

The Latency Estimator supports the Grouping Agent. The Grouping Agent may need the latency distribution between any arbitrary pair of clients. Specifically, it will request latency distributions only between those clients that have the same lobby hash. It will apply the distribution test that the game developer has provided. Each client is identified by a unique ID, and details of its connectivity (CID, MNC, MCC, FRH, radio link technology) – this identification is created transparently for the game developer in the MatchmakingClient base class definition. The Latency Estimator relies on data stored in the Latency Data database. This database contains raw latency measurements, of the same form as the data in the experiments in Section 5.3. Each record has a timestamp, client unique ID, client connectivity (CID, MNC, MCC, FRH, radio link technology), RTT latency, des- tination FRH. The record is identifying the RTT latency between that client and the destination FRH that it probed, which may be its own FRH or a remote FRH. The database keeps a sliding window of measurements from the past 15 minutes, as Section 5.3 shows that older data is of lower quality. The database is fed by the Measurement Controller. The Measurement Controller divides the global set of clients (that are currently in any matchmaking lobby) into

150 three queues: (1) the free pool; (2) the active pool; and (3) the cooloff pool. Clients in the free pool are grouped by their CID, and one client under each CID is chosen at random. The chosen clients are moved into the active pool and are requested to conduct a batch of measurements (the quantity and duration of measurements is configurable; we assume 10 probes per request in our experiments). Once they report back measurements, they enter the cooloff pool. Information from the Latency Estimator determines if clients are moved from the cooloff pool into the free pool. The Latency Estimator identifies for which CIDs it does not have sufficient measurements – at least 60 measurements within the last 15 minutes from any clients under that CID to their common FRH. This list of CIDs is handed to the Latency Estimator (every 30 seconds), which moves all clients under any of these CIDs from the cooloff pool into the free pool (and any that are not into the cooloff pool). When the Measurement Controller asks a client to perform measurements, it hands over three parameters: (1) the measurement duration; (2) the measurement rate; and (3) a list of unique FRHs. The Measurement Client will interleave pings from the phone to its FRH with pings to a randomly-selected distant FRH. At the end of the measurement duration, the results are reported back. The measurement rate has to be high enough to keep the phone radio awake in DCH mode – in our experience, sending a packet every 100ms suffices. The Grouping Agent calls the Latency Estimator to get a latency distribution between a pair of clients. The Latency Estimator computes this as the sum of three components: (1) latency of the first client to its FRH; (2) latency of the second client to its FRH; and (3) FRH to FRH latency. For between a client and its FRH, the Latency Estimator calculates a distribution among all latency measurements from

151 any client under the same CID to the same FRH, from the past 15 minutes 2. For FRH to FRH latency, we rely on a system like Htrae [9]. It feeds on the Latency Data database, but subtracts client to FRH latency from client to remote FRH latency to feed the network coordinate system. Since we do not have a geo-location database that works with FRH IP addresses, this is practically similar to Pyxida [100].

5.4.4 Grouping Agent

For each unique lobby hash, the Lobby Service hands over to the Grouping Agent the list of clients, the maximum number of players, and the latency test parameters 3. The Grouping Agent treats each lobby hash completely separately (there are multiple instances of the Grouping Agent, each handling one hash). The Grouping Agent obtains the latency distributions between each pair of clients in this lobby from the Latency Estimator. It constructs a graph of these clients. The weight (or length) of the edge between two clients is the latency between them at the given percentile. This graph, along with the latency and size limits from the game developer, are handed to the grouping algorithm, described next in Section 5.4.5. Once the grouping algorithm successfully places clients into game sessions, it returns the list of sessions to the Lobby Service. A session is viable only if it has at least 2 players. The Lobby Service removes all clients in viable sessions from the Measurement Controller’s client pools and returns the list of session peers to the respective clients. Any clients that were not placed in a viable session remain in the lobby for another round of matchmaking or until they voluntarily leave (that part of the API is not described in Figure 5.16). Clients can remain in the lobby when there

2 If there are insufficient measurements, the return value to the Grouping Agent identifies the client(s) with insufficient data and they are removed from the current grouping round (and remain in the measurement pools). 3 For any particular lobby hash, we expect all game clients to specify the same latencyPercentile, latencyLimit and MaxPlayers. Alternatively, we can incorporate these three parameters into the hash itself to further segregate players.

152 is insufficient latency data for that client’s CID, or there are insufficient players with whom they can form a viable session.

5.4.5 Grouping Algorithm

The goal of the algorithm is to assign players to different groups in a way that: i) maximizes the number of players in each group; and ii) satisfies the latency and group size constraints specified by game developer. An important factor that affects the grouping process is the topology formed by players within a group. While we now focus on grouping for the clique topology, our grouping algorithm can be easily adapted to accommodate other topologies. As mentioned in §5.2.5, the grouping problem can be casted into the minimum clique partition problem which is NP-hard. Given that a popular mobile game may attract tens of thousands of players, we need an algorithm that is both effective and scalable. We find that cluster analysis [3] is particularly well suited to solve the grouping problem. Clustering refers to the assignment of a set of observations into clusters based on a distance measure between observations. If we treat each player as an observation and the latency between two players as their distance, we can leverage a wealth of well-established clustering methods to solve the grouping problem. While there exist many clustering methods, we pick hierarchical [77] and quality threshold (QT) [80] clustering because they have low computational complexity and can easily accommodate different group topologies (e.g., clique and star-hub). We do not consider K-means (which is another commonly-used clustering method) because it requires specifying the number of clusters a priori, making it difficult to enforce the latency constraint. Hierarchical clustering starts with each individual player as one cluster. It pro- gressively merges pairs of closest clusters according to a distance measure, until the cluster diameter exceeds the latency constraint. In contrast, QT clustering first

153 builds a candidate cluster for each player by progressively including the player clos- est to the candidate cluster (according to a distance measure), until the candidate cluster diameter exceeds the latency constraint. It then outputs the largest candidate cluster, removes all its members, and repeats the previous step. Note that when the size of a cluster is too large, we need to further divide it into smaller clusters to meet the group size constraint. For the clique topology, the distance between two clusters is defined as the max- imum latency between players of each cluster. The diameter of a cluster is defined as the maximum latency between players in the cluster. The time complexity of hierarchical and QT clustering is opn3q and opn5q respectively. We emphasize that both clustering methods can work with any type of topology in which distance and diameter are well defined. For instance, in star-hub topology, the distance between two clusters can be defined as the latency between the hub players of each cluster. The diameter of a cluster can be defined as the maximum latency between the hub player and any star player in the cluster. The grouping algorithm is polynomial, and hence its running time can be long when there are a large number of players. Waiting in the matchmaking lobby for a long time can degrade the experience of game players. To tackle this problem, we first divide all the players into smaller buckets with at most B players in a bucket, and then apply the grouping algorithm to each bucket. In this way, we can easily parallelize the grouping of all the buckets and control the grouping time by adjusting B. While a smaller B shortens grouping time, it can lead to less optimal groups. We evaluate how B impacts grouping time and group sizes in §5.5.2. In our current implementation, we randomly assign players to different buckets. In the future, we plan to explore other assignment strategies, such as based on geographic location or LAC.

154 While waiting in a matchmaking lobby for grouping to finish, other players may join or leave the lobby. The graph that the grouping algorithm is operating on could be modified in real time as the algorithm runs. However, for simplicity, the Lobby Service in Switchboard calls the Grouping Agent at fixed intervals for each lobby. This not only limits grouping overhead but also allows the accumulation of a sufficient number of players to feed into the grouping algorithm. The choice of the interval needs to balance player wait time with the popularity of a particular game.

5.5 Evaluation

5.5.1 Implementation

The service side of Switchboard is implemented on the Microsoft Azure cloud plat- form. In our current deployment, we use a single hosted service instance and a single storage service instance, both in the “North Central US” region. Matchmaking itself is not very sensitive to small latencies (hundreds of ms) because it is only sets up the game session and is not used during gameplay itself. Hence we have deployed only a centralized instance of the service. The service is written entirely in C#, and heavily leverages the .NET 4.0 libraries. The Measurement Controller is written in 457 lines of code, with an additional 334 lines of message formats and API that is shared with the client. The Lobby Service is 495 lines of code. The Latency Estimator is 2,571 lines of code, but contains a very large amount of analysis code to support this chapter and can be significantly slimmed. The Grouping Agent is 363 lines. The client side of Switchboard is a mix of C# and native C code. The Measure- ment Client is 403 lines and the 334 shared with the Controller. The P2P Testing Service which actually handles the probes is written in C due to the lack of managed APIs for getting connectivity information and doing traceroutes. The client side is implemented for Windows Mobile 6.5. However, we have a port of just the P2P

155 Testing Service to Android, which we used to help gather data for this chapter. We also use a simple client emulator of 48 lines to to stress test our service on Azure.

5.5.2 Evaluation of Grouping

We now evaluate the importance of pairwise latency estimation for effective grouping and the impact of bucket size on grouping time and group sizes. To evaluate grouping at scale, we need a large number of players and their latency data. Unfortunately, we are not aware of any large corpus of detailed latency measurements from a wide set of phones. Therefore, we attempt to generate a synthetic model of phones, their locations, the locations of towers, the locations of FRHs, and the latencies associated with each. We then evaluate how grouping performs on such a topology. To generate a realistic distribution of players, we use population data by county from the US census [2]. To cluster users by tower, we use cell tower locations from a public US FCC database [4], which contains detailed information about the cell towers registered with the FCC. Combining these two data sources, we break down the towers by county and compute the fraction of total population served by a tower

poppCq T in county C as F racP oppT q“ ntowerpCqˆTotalPop . Here poppCq and ntowerpCq are the population and number of towers in C and T otalP op is the total population. Next, we need to connect the towers to FRHs. Today, US operators have many FRHs, but they are typically co-located in a few datacenters across the US (based on private conversations with operators). Not knowing where these datacenters are, we simply divide the US into four Census Bureau-designated regions (Northeast, Mid- west, South, and West), and pick a metropolitan area from each region (Washington DC, Chicago, San Antonio, and San Francisco) as the FRH datacenter location. Finally, we generate a set of n players using this model. For each tower T , we generate n ˆ F racP oppT q players. Essentially, we proportionally assign n players to each tower according to the population density of the county in which the tower

156 sits. For each player p under T , we randomly pick its geographic coordinate (latppq and lonppq) within a predefined radius of T . The maximum range of a tower varies from 3 to 45 miles, depending on terrain and other circumstances [1]. We picked a radius of 20 miles. We also assign p to the geographically closest FRH datacenter (FRHppq). The RTT between p and FRHppq (rttFRHppq) is randomly drawn from the latency distribution to the first pingable hop collected by prior work [83] from 15,000 mobile users across the US. As per our findings in §5.3.2, all players under the same tower are assigned the same RTT to their corresponding FRH datacenter.

We can now compute the latency between any pair of players pp1,p2q as:

rttFRHpp1q` rttFRHpp2q` rttpFRHpp1q,FRHpp2qq

rttpFRHpp1q,FRHpp2qq represents the RTT between FRHpp1q and FRHpp2q, which we derive from a geographic distance-based latency model from prior work [9]. We compute the geographic distance using the great-circle distance between a pair of coordinates.

Latency- vs. Geography-based Grouping

To contrast with our algorithm, we also try a naive algorithm which groups players by their geographic proximity (instead of latency proximity in Switchboard). In this experiment, we evaluate the effectiveness of geography- vs. latency-based grouping. We first generate a game topology of 50,000 players and divide the players into buckets of 1,000 players each. We then run the hierarchical clustering algorithm (described in §5.4.5) on each bucket, using pairwise player latency or geographic distance. Note that each GeoGroup produced by the geography-based grouping is guaranteed to meet the specified distance constraint. However, unlike in the latency- based grouping, a GeoGroup may include players that violate the latency constraint specified by game developer. We further prune outliers from a GeoGroup to obtain the corresponding viable group which fully satisfies the latency constraint.

157 1 100 miles 0.9 200 miles 0.8 400 miles 800 miles 0.7 250 ms 0.6 0.5 0.4

empirical CDF empirical 0.3 0.2 0.1 0 2 4 6 8 10 12 14 16 group size

Figure 5.17: CDF of number of players in each group after grouping 50,000 players split into buckets of 1,000 players each, with a latency limit of 250ms. The top four lines show results from grouping players based on geographic proximity, while the bottom line uses latency proximity.

1 200 miles 0.9 400 miles 0.8 800 miles 1600 miles 0.7 400 ms 0.6 0.5 0.4

empirical CDF empirical 0.3 0.2 0.1 0 2 4 6 8 10 12 14 16 group size

Figure 5.18: CDF of number of players in each group after grouping 50,000 players split into buckets of 1,000 players each, with a latency limit of 400ms. The top four lines show results from grouping players based on geographic proximity, while the bottom line uses latency proximity.

Figure 5.17 shows the CDF of group sizes using the two algorithms. We set the game latency constraint to 250 ms and vary the distance constraint from 100 to 800 miles for geography-based grouping. The maximum group size is limited to 16.

158 Clearly, latency-based grouping produces much bigger groups than geography-based grouping, with the median group size of 15 vs. 2. Although not shown in the figure, both grouping schemes assign roughly the same number of players to viable groups. Geography-based grouping does not work well because 3G latency between players is poorly correlated with their geographic proximity. This is unsurprising because our latency experiments show it is dominated by the latency to FRH and not by the latency between FRH s. Figure 5.17 further shows that geography-based grouping produces larger viable groups when the distance constraint increases. However, this effect diminishes as the constraint surpasses 400 miles. Since there is little correlation between geographic distance and latency in mobile networks, the (viable) group size increase is mainly because a larger distance constraint produces bigger GeoGroup’s. Irrespective of the choice of distance constraint, latency-based group dominates geography-based grouping. We see similar results in Figure 5.18 with a latency constraint of 400 ms.

Effect of Bucket Size

Having established that latency-based grouping is the better approach, we now con- sider the impact of bucket size and the particular form of clustering – QT or hier- archical. In Figures 5.19 and 5.20 we vary both and examine the impact on group size distribution and running time. While the total number of players who can par- ticipate in viable groups is roughly the same in each experiment (not shown in the figure), group sizes steadily grow with bucket size as expected. With a bucket size of 1000 players, 63% of the resulting groups have 16 players. This ratio improves to 75% with a bucket size of 1500. While group sizes are roughly similar between hierarchical and QT clustering, the running time is not. The running time for either grows with a larger bucket size, that for QT clustering grows much faster due to higher computational complexity (§5.4.5). Game players can be tolerant of a small

159 1 QT 500 0.9 Hier 500 0.8 QT 1000 Hier 1000 0.7 QT 1500 Hier 1500 0.6 0.5 0.4

empirical CDF empirical 0.3 0.2 0.1 0 2 4 6 8 10 12 14 16 group size

Figure 5.19: CDF of number of players in each group after grouping 50,000 players split into buckets of varying sizes, with a latency limit of 250ms. The “QT 500” line shows results with QT clustering on a bucket size of 500 players. The “Hier 1500” line shows results with hierarchical clustering on a bucket size of 1,500 players.

900 QT 800 Hier 700 600 500 400 300 200 grouping time (second) grouping 100 0 500 600 700 800 900 1000 1500 bucket size

Figure 5.20: Runtime of grouping algorithms for grouping 50,000 players split into buckets of varying sizes, with a latency limit of 250ms. The “QT” bars on the left show results with QT clustering, while the “Hier” bars on the right show results with hierarchical clustering. delay (a minute or two) during matchmaking, as they can be appeased with game storyboard animation, but anything larger is less tolerable.

160 Table 5.2: Experimental parameters for end-to-end experiments.

Parameter Value Maximum end-to-end latency bound 250 ms Client arrival distribution Poisson Client arrival rate Varies ICMP measurement expiration period 15 min ICMP measurement probes per server request 10 Per-tower measurements required for matchmaking 60 Client “cooloff” period between probe requests 30 - 90 s (random) Bucket size for grouping 500

5.5.3 End-to-end Evaluation

We evaluate the performance of the complete end-to-end Switchboard system as a whole, including measurement and grouping. Our fully-functional Switchboard im- plementation is deployed as a Microsoft Azure cloud service. To consider a large client base, we emulate users by spawning new instances of the client emulator on a high-powered desktop (with new clients connecting at varied Poisson arrival rates). When requested by the Switchboard service to conduct measurement tasks, emu- lated clients use the same model we used to test grouping based on the US Census and FCC databases. Each client waits for placement in a matchmaking group until (1) such a group is formed or (2) Switchboard determines that the client’s first-hop latency is too high, and thus group placement is impossible. We assume that each client only seeks matchmaking for a single game. In reality, clients may amortize the matchmaking costs over multiple games. We summarize experimental parameters in Table 5.2. For each of these experiments we compare Switchboard performance across a va- riety of synthetic, Poisson-distributed, client arrival patterns. Of course, the validity of our results is tied to how well these align with the true arrival pattern of real clients. Thus, they should only be viewed in relative terms. However, a number of key properties emerge regarding performance at scale. As the number of clients using Switchboard increases, (1) server bandwidth requirements scale sub-linearly,

161 (2) per-client probing (and thus bandwidth) overheads decrease, (3) larger groups can be formed, and (4) client delays for measurement and grouping decrease.

Bandwidth and Probing Requirements

We now quantify the client-to-server bandwidth requirements of phones reporting la- tency measurements to the Switchboard server and how probing tasks are distributed among devices. As explained in §5.4.3, probing is conducted in bursts of at least one packet per 100 ms, ensuring that the sending rate triggers the phone to enter the DCH mode. To amortize the energy cost of this traffic burst, phones report mea- surements in 10-probe batches. The frequency at which clients conduct these probe bursts depends on complex interactions, such as the rate at which clients arrive and when their measurements expire. The bandwidth consumed by the matchmaking service to collect measurement data from phones is primarily determined by the total number of towers with active game players as the total number of measurements required for each tower is fixed, irrespective of the number of clients. Figure 5.21 shows client-to-server bandwidth over time, aggregated across all clients connected to Switchboard running on Azure. We require at least 60 measurements within the last 15-minute interval for each tower. Clients are not considered for groups until this minimum number of measure- ments have been conducted for their associated tower. The bandwidth consumed stabilizes after an initial warming period during which the Switchboard Measure- ment Controller builds an initial 15-minute history for many towers. Furthermore, as we increase the client arrival rate, the bandwidth consumed scales sub-linearly – at 10 clients/second, the bandwidth consumed is not 10 times that at 1 client/second. Figure 5.22 shows the distribution of ICMP measurements performed by each client. As the client arrival rate increases, greater measurement reuse is possible because there are more clients under each tower that benefit from each other’s obser-

162 80 10 clients/s 5 clients/s 70 2 clients/s 60 1 client/s 50 40 30 -server traffic (Kbps) (Kbps) traffic -server to 20

client- 10 0 0 15 30 45 60 time (minutes)

Figure 5.21: Aggregate client-to-server bandwidth by client Poisson arrival rate for Switchboard running on Azure. The first 15 minutes reflects a warming period with elevated measurement activity as the server builds an initial history.

1 10 clients/s 0.8 5 clients/s 2 clients/s 0.6 1 client/s

0.4 empirical CDF empirical

0.2

0 0 10 20 30 40 50 60 ICMP probes per client

Figure 5.22: CDF of ICMP probes per client at different client Poisson arrival rates, as conducted by the Measurement Controller in Switchboard running on Azure. Data reflects hour-long experiments and exclude warming period. vations. Further, the distribution of measurement tasks becomes more equitable (the CDF lines shift to the left), reflecting that at greater load, Switchboard overhead for each client becomes lower and more predictable.

163 1

0.8 1 client/s 0.6 2 clients/s 5 clients/s 0.4 10 clients/s empirical CDF empirical

0.2

0 23456789 10 11 size of client group

Figure 5.23: CDF of resulting group sizes at different client Poisson arrival rates. Grouping uses 500-client buckets. Data reflects hour-long experiments and exclude warming period.

Client Matchmaking Experience

We now evaluate the size of viable groups that Switchboard creates and how long clients wait for those results. In Figure 5.23, we show that at higher client arrival rates, it is possible to form larger matchmaking groups. This is expected, since at higher arrival rates, the steady-state number of waiting clients is also higher—providing a larger pool of clients on which to cluster. Note that the analysis in §5.5.2 reflects absolute grouping performance, with all clients available for clustering simultaneously and immediately. In this section, we additionally consider the effects of client arrival rate and re- grouping through multiple rounds, more closely reflecting real-world performance. Here, bucket size (chosen as 500) reflects the maximum number of clients that may be simultaneously clustered. At insufficient arrival rates, there will be fewer than 500 waiting clients. Further, since clients are placed into groups as soon as one is available, those clients that wait through multiple clustering attempts are likely to be the hardest to place (with relatively higher latency). These factors will typically

164 1

0.8 Grouping 10 clients/s Grouping 1 client/s 0.6 Measurement 10 clients/s Total 10 clients/s 0.4 Measurement 1 client/s empirical CDF Total 1 client/sec 0.2

0 0 100 200 300 400 500 600 700 time until placed in group (S)

Figure 5.24: Client time spent in measurement and grouping. Measurement reflects the time from when a client joins a lobby until there is sufficient data for the client’s tower. Time required for grouping reflects the total time from when measurement data is sufficient until the client is placed into a viable group (one or more clustering attempts). Grouping performed with randomized buckets of up to 500 clients. lead to the creation of smaller-size groups. If larger groups are desired, Switchboard can be configured to reject groups of insufficient size. Figure 5.24 shows the total amount of time a client spends in matchmaking, broken down by the measurement delay and the grouping delay. Clients are grouped using random buckets of 500 clients. Each bucket is processed in parallel by separate threads. A client may have to wait through multiple grouping attempts before one or more peer clients are ready with which it can be grouped (again, these results are not directly comparable to those in Section 5.5.2 as client arrival rate and re-grouping contribute to grouping performance). Note that since higher arrival rates enable both greater measurement reuse and increase the pool of clients for clustering, these delays substantially reduce with more users.

5.5.4 Summary of Evaluation Results

Our implementation and evaluation confirm our intuitions for Switchboard perfor- mance, especially as function of scale. Switchboard’s mechanisms to cluster groups

165 by latency proximity are substantially more effective than geography-based tech- niques. Comparing QT and hierarchical clustering for group formation, we find that hierarchical clustering is more effective, creating similarly-sized groups to QT at a smaller computational delay. Finally, with increasing utilization, server bandwidth requirements scale sub-linearly, per-client probing overheads decrease, larger groups are formed, and client delays for measurement and grouping decrease.

5.6 Conclusion

Turn-based multiplayer games are available on multiple phone platforms and are popular. We want to enable fast-paced multiplayer games over cellular data networks. While 3G latencies can often be within the tolerance of some fast games [49], such games are not common because it is difficult for the game developer to deal with the highly variable nature of 3G latencies. First, we demonstrate that P2P over 3G is a viable way to both reduce the latency of such games and the cost to the developer to maintain game servers. Depending on the number and location of servers, P2P can save as much as 148ms of median latency. Second, we have built Switchboard to reduce the burden on the game developer for managing this highly variable latency. It solves the matchmaking problem, or specifically assigning players to game sessions based on latency. Switchboard achieves scalability both in measurement overhead and in computation overhead. Based on experiments, we show that a small number of measurements in a 15 minute window sufficiently characterizes not only the latency of that phone, but also of other phones under the same celltower. This latency distribution is also highly predictive of the next 15 minutes. Using this information, Switchboard is able to significantly reduce the measurement overhead by coordinating across many phones. Switchboard then exploits the nature of this latency in a heuristic that quickly assigns players to game sessions.

166 This is a ripe new research area with many other open problems. Specifically in matchmaking, our work does not consider phones that are moving (e.g., on a bus) – perhaps one can predict future celltowers (and hence future latency) by looking at the phone’s trajectory. We do not attempt to estimate and predict bandwidth over 3G. We do not consider remaining energy in assigning measurement tasks to phones, or any other explicit form of fairness. There are interesting challenges in energy conservation during game play, and improving touch-based UI for fast action gaming.

167 6 An Object Positioning System using Smartphones

This chapter attempts to solve the following problem: can a distant object be localized by looking at it through a smartphone. As an example use-case, while driving on a highway entering New York, we want to look at one of the skyscrapers through the smartphone camera, and compute its GPS location. While the problem would have been far more difficult five years back, the growing number of sensors on smartphones, combined with advances in computer vision, have opened up important opportuni- ties. We harness these opportunities through a system called Object Positioning System (OPS) that achieves reasonable localization accuracy. Our core technique uses computer vision to create an approximate 3D structure of the object and cam- era, and applies mobile phone sensors to scale and rotate the structure to its absolute configuration. Then, by solving (nonlinear) optimizations on the residual (scaling and rotation) error, we ultimately estimate the object’s GPS position. We have developed OPS on Android NexusS phones and experimented with lo- calizing 50 objects in the Duke University campus. We believe that OPS shows promising results, enabling a variety of applications. Our ongoing work is focused on

168 coping with large GPS errors, which proves to be the prime limitation of the current prototype.

6.1 Introduction

Imagine the following scenario in the future. While leaving for office, Alice needs to ensure that the repairman comes to her home later in the day and fixes the leakage on the roof. Of course, the leak is small and Alice must point out the location of the leak. To this end, she walks across the road in front of her house, points her camera towards the leak, takes a few photos, and types in “leaking from here”. Later, when the repairman comes to Alice’s house, he points his camera towards the roof and scans – when the leak is inside the camera’s view-finder, Alice’s message pops-up. The repairman repairs the leak and leaves. Alice comes back home in the evening, points her camera towards the leak, and sees the repairman’s tag: “repaired, received payment, thanks!”. Before returning into her house, she cursorily scans the neighborhood with her phone to see if there was anything new. She finds a “pool party Saturday evening” tag at the community swimming pool, and another on a tall crane at a nearby construction site, that read “too noisy: 13 votes”. Alice remembers how she has been frustrated as well, so points her camera at the crane and votes. She looks at the tag again to confirm, which now reads “too noisy: 14 votes”. While this may be an intriguing vision of the future, the core idea of tagging objects in the environment, and viewing them through a smartphone’s viewfinder, is old. A variety of augmented reality applications have already built such frame- works – Wikitude and Enkin even offer them on the app store [165]. However, these applications implicitly assume that objects in the environment have been annotated out-of-band – that someone visited Google Earth, and entered a tag for the swim- ming pool. Later, when an Enkin user looks at the same pool through her camera

169 viewfinder, tags of all the objects in her viewfinder pops up. We believe that out- of-band tagging is one of the impediments to augmented reality (AR) becoming mainstream. The ability to tag the environment spontaneously will be vital if users must embrace AR applications in their daily lives. This project – Object Positioning Systems (OPS) – is tasked to address this “missing piece” in today’s AR applications. Our ultimate goal is to offer a service that allows a lay user to point her smartphone to any object in the environment and annotate it with comments. While this is the front-end functionality of our system, the key challenge in the back-end pertains to object localization. Our system essentially needs to compute the GPS location of the desired object, and then trivially associate the user-generated tag to that location. Another user standing at a different location should be able to look at the same object, run our system to compute its location, and retrieve all tags associated to it. Ideally, the system should operate in real time, so the user can immediately view the tag she has created. While translating this vision to reality warrants a long-term research effort, as a first step, we narrow down its scope as follows. We sidestep indoor environments due to their stringent requirements on object positioning accuracy – a tag for a chair cannot get attached to the table. Therefore, we focus on outdoor objects and as- sume desktop-type CPU capability (which if unavailable on today’s phone, may be available through the cloud). Even under this narrowed scope, the challenges are multiple: (1) State-of-the-art in computer vision is capable of localizing objects from hundreds of pictures of the same object [190]. In the case where a few pictures are available – such as those taken by Alice of her rooftop – computer vision becomes inapplicable. Our intuition suggests that sensor information from mobile devices should offer opportunities to compensate for the deficiencies in vision, but the tech- niques for such information fusion are non-trivial. (2) The smartphone sensors, such as GPS, accelerometer, compass, and gyroscope, are themselves noisy, precluding the

170 ability to pivot the system on some ground truth. Hence, aligning sensor information with vision will become even more difficult, requiring us to formulate and solve a “mismatch minimization” problem. (3) OPS needs to identify the user’s intention – different objects within the viewfinder may be at different depths/locations, and only the intended object’s location is of interest. (4) Finally, the system needs to be reasonably lightweight in view of the eventual goal of on-phone, real-time operation. The design of OPS has converged after many rounds of testing and modifica- tion. Our current prototype on Android NexusS phones has been used to localize 50 objects within the Duke University campus (e.g., buildings, towers, parking lot, cranes, trees). Performance evaluation shows that the system exhibits promising behavior. In some cases, however, our errors can be large, mainly stemming from excessively-high GPS errors. Nonetheless, OPS is able to identify and communicate such cases to the user – like a confidence metric – allowing them to re-attempt the operation. While not ready for real-world deployment, we believe OPS demonstrates an important first step towards a difficult problem with wide-ranging applications. The key contributions in OPS are summarized as follows.

1. Localization for distant objects within view: We show opportunities in multimodal sensing to localize visible objects in outdoor environments, with core techniques rooted in mismatch optimization.

2. System design and implementation on the Android NexusS platform: Reasonably lightweight algorithms achieve promising location accuracy, with marked improvements over an optimized triangulation-based approach using GPS and compass.

The rest of the chapter expands on these contributions, beginning with motivation and overview in Section 6.2 and primitives of OPS localization in Section 6.3. Next, in Section 6.4, we present the design of OPS. In Section 6.5, we address additional

171 practical challenges for translating the core design into a complete system. We provide results from our testing experiences in Section 6.6 and our ongoing work to improve OPS in Section 6.7. We compare OPS with the state of the art in Section 6.8. Section 6.9 concludes with a brief summary.

6.2 Motivation and Overview

This section visits the motivation of the chapter, with a generalization of OPS to other applications, and then presenting a functional overview of the system. The subsequent sections elaborate on the core technical challenges and solutions.

6.2.1 Applications beyond Tagging

An Object Positioning System (OPS) has natural applications in tagging the envi- ronment. While this was our initial motivation, we observed that the core capability to localize a distant object is probably a more general primitive. In contemplating on the possibilities, we envisioned a number of other applications that can overlay on OPS: (1) Location-based queries have been generally interpreted as queries on the user’s current location (e.g., “restaurants around me,” “driving directions from here to the airport”). However, queries based on a distant object can be entirely natural, such as “how expensive are rooms in that nice hotel far away,” or “is that cell tower I can see from my house too close for radiation effects?” While walking or driving up to the object location is one way to resolve the query, the ability to immediately look up the hotel price based on the hotel’s location, is naturally easier. OPS could enable such “object-oriented queries.” (2) OPS could potentially be used to improve GPS, particularly where the GPS errors are large or erratic. This is true even though OPS actually depends on GPS. The intuition is that combination of multi-modal information – vision and GPS in

172 3D Geometry Photographs of Object

Accelerometer, Object North Compass, GPS Coordinate Optimization User Techniques East

Figure 6.1: An architectural overview of the OPS system – inputs from computer vision combined with multi-modal sensor readings from the smartphone yield the object location.

this case – can together improve each of the individual dimensions. Thus, knowing the location of the object can help improve the location of the camera. (3) High-end cars entering the market are embedded with a variety of safety features [134], such as adaptive cruise control, lane change detection, blind spot alerts, etc. Existing cars remain deprived of the capabilities since upgrades may be expensive, even if feasible. High accuracy OPS technologies on mobile smartphones may enable services that approximate these capabilities. Smartphones mounted near the car’s windshield could estimate location of other objects in the surroundings, and trigger appropriate reactions. To summarize, one may view OPS as somewhat analogous to GPS – GPS satellites help receivers estimate self-location, while OPS phones estimate other’s-locations. It is this analog that motivates our metaphor – satellites in our pockets.

6.2.2 System Overview

We present an overview of OPS with the goal of introducing the functional compo- nents in the system, and their interactions. We expect it to help the transition to technical details.

173 When a user activates OPS on her smartphone, the camera is automatically turned on, along with the GPS, accelerometer, compass, and gyroscope. The user is expected to bring the object of interest near the center of her viewfinder, and take a few pictures from different positions. These positions can be separated by a few steps from each other in any direction – the goal is to get multiple views/angles of the same object. As few as 4 photos are adequate, however, more the better. Once completed, OPS displays the object’s GPS coordinate. While this is a simple front-end, Figure 6.1 shows the flow of operations at the back-end. The pictures taken by the user are accepted as inputs to the computer vision module, which implements a technique called structure from motion (SfM). Briefly, SfM is the process of extracting a 3D structure of an object from diverse views of a moving camera. As a part of this process, SfM first identifies keypoints in each picture – keypoints may be viewed as a set of points that together capture the defining aspects of the picture. The keypoints are matched across all the other pictures, and those that match offer insights into how the camera moved (or its angle changed) while the user clicked the different pictures. The final output of this process is a 3D structure, composed of the object and the camera locations. Importantly, the 3D structure – also called the point cloud – is not in absolute scale. Rather, the point cloud offers information about the relative camera positions, as well as the relative distances between the cameras and the object. To be able to obtain the GPS location of the object, the point cloud needs to be “grounded” on the physical coordinate system. In an ideal scenario, where the point cloud and the GPS locations are both precise, it would be easy to scale the relative camera locations to match the GPS points. This will scale the object-distance as well, eventually yielding the absolute object location. Unfortunately, errors in the point cloud, and particularly in GPS readings, introduce a mismatch. Therefore, OPS uses the configuration of the camera-locations in the point cloud to first adjust the

174 GPS positions. To this end, OPS formulates and solves an optimization problem to minimize the total adjustments. The next goal is to use the compass readings from these (corrected) locations to triangulate the object of interest. Again, if all compass readings were accurate, any pairwise triangulation from the GPS points should yield the same object loca- tion. Unsurprisingly, compasses are noisy as well – therefore OPS executes another optimization that minimizes the total adjustments on all compasses, under the con- straint that all triangulations result in the same object location. This corrects the compass readings, and also offers a rough estimate of the object’s distance from the GPS locations. By applying the compass readings back on the 3D point cloud, and again solving an optimization problem (detailed later), OPS finally converges on the object location. OPS also extracts the height of the object, by incorporating the angular pitch of the phone while taking the picture. Thus, the final output is a location in 3D space, represented as a GPS coordinate and a height above the ground. The follow- ing sections zoom into the details of each of these components, beginning with the primitives of object localization.

6.3 Primitives for Object Localization

Inferences of a distant location from a known point-of-origin is an old problem. His- torically, the principles of triangulation date to Greek philosophers of the 6th Century BC. Land surveying applies the same basic techniques today at great precision. The related technique of trilateration (location determination through known distances, rather than angles) is the technical basis of GPS. As a starting point, we investigate the applicability of these techniques to object localization.

175 Object Position (a,b)

Compass Bearing

(x1,y1)

(x2,y2)

Figure 6.2: Compass-based triangulation from GPS locations px1,y1q, px2,y2q to object position pa, bq.

Why not use GPS/compass to triangulate?

Smartphones have embedded GPS and compass (magnetometer) sensors. The precise location and compass bearings from any two points determines a pair of lines1. The object-of-interest should fall at their unique intersection. We illustrate compass- based triangulation in Figure 6.2. In principle, if a user points her phone at an object-of-interest from two distinct locations, we should be able to easily infer the object’s location. Of course, to obtain these distinct locations, we cannot ask the user to walk too far, or using the system would be impractical. Instead, we can imagine the user walking just a few steps to infer the location of the object, say 40 meters away. When the scenario is such (i.e., distance between camera views is much smaller than the distance from the camera to the object), compass precision becomes crucial. A few degrees of compass error can dramatically reduce the accuracy of triangulation. Similarly, if the distance between the camera views are slightly erroneous, the result can also be error-prone. Smartphone sensors are not nearly designed to support such a level of precision. GPS can be impacted by weather (due to atmospheric delay), clock

1 The two points must not be collinear to the remote location.

176 errors, errors in estimated satellite ephemeris, multipath, and internal noise sources from receiver hardware. Compass magnetometer readings are imprecise and subject to bias, due to variation in the Earth’s magnetic field and nearby ferromagnetic material. Triangulation, at least under these extreme conditions, does not apply immediately.

Can smartphones apply trilateration?

Trilateration requires estimating the distance to the object-of-interest (range), from two vantage points. GPS is a popular implementation of trilateration – the distances from multiple satellites are computed from the propagation delay of the correspond- ing signals. Unfortunately, a GPS-like scheme is inapplicable for object positioning, since the objects are not collocated with a wireless radio. The phone camera, how- ever, may partially emulate this functionality without any requirement of infrastruc- ture at the object. This can naturally be an appealing alternative. So long as the object-of-interest remains clearly in the camera view, the size of an object in the picture is a function of the camera’s distance to that picture. The size can be estimated by the visual angle needed for that object (Figure 6.3), which can be computed as v “ 2 arctanps{dq, where v is the visual angle, s is the size (or height) of the object, and d is the distance to the object. Since we do not know object size s, we cannot compute d. However, knowing two different visual angles from two distinct locations, it is possible to eliminate s and obtain a ratio of the distances to the object from these locations. Let σ denote this ratio; then σ can be computed as d1 tanpv{2q σ :“ “ d tanpv1{2q

Thus, although visual trilateration cannot precisely localize the object, the value of σ can certainly offer hints about the object’s position. If one plots all points in

177 v Observer s s 2arctan( ) 2d Object Size

d Distance

Figure 6.3: The visual angle v relates the apparent size s of an object to distance d from the observer.

(a,b)

d1 d2 (x1,y1)

(x2,y2)

Figure 6.4: Visual Trilateration: unknown distances from GPS locations px1,y1q and px2,y2q to object position pa, bq are in a fixed ratio d2{d1.

space that are away from two camera locations in the ratio of σ, one gets a curve as shown in Figure 6.4. The object will naturally lie at some location on this curve.

Can phone cameras also triangulate?

Land surveying systems typically use optical sensing for precise triangulation. Possi- bly, the camera could be exploited to improve the accuracy of compass-based triangu- lation as well. Multiple views of an object from different angles, even if only slightly different, produce visual distortions, due to the phenomenon of parallax. Points in the foreground appear to change in relative position to points in the background. The properties of parallax, and visual perception in general, are well-understood. For example, stereo vision leverages parallax effects to invoke a three-dimensional

178 (a,b) Fixed Interior Angle

(x1,y1)

(x2,y2)

Figure 6.5: Visual Triangulation: fixed interior angle from known GPS location px1,y1q to unknown object position pa, bq to known GPS position px2,y2q .

perception from two-dimensional images. Thus, with a careful analysis of images taken from multiple nearby locations, it should be possible to invert these effects. In particular, it would be possible to infer the interior angle between a pair of GPS locations and the object position. However, knowing the interior angle is again not adequate to pinpoint the object location – instead it offers a curve and the object can be at any location on this cure. Figure 6.5 captures this efficacy of visual trian- gulation.

Combining Triangulation and Trilateration

While neither triangulation nor trilateration can pinpoint object location, observe that computing the intersection of the two curves (in Figure 6.4 and Figure 6.5) yields a small number of intersection points. Moreover, if compass triangulation is added, there is more than adequate information to uniquely identify the object position pa, bq. Figure 6.6 shows the superimposition of all four curves – observe that this is an over-constrained system, meaning that there is more than sufficient information to compute an unique solution.

179 Object Position (a,b)

d2 Fixed Compass Interior Bearing d1 Angle (x1,y1)

(x2,y2)

Figure 6.6: Intersection of the four triangulation curves for known points p0, 0q and p10, ´4q, localized point p4, 8q, distance ratio σ “ 6 p5q{4 p5q “ 1.5, and internal angle γ “ 2 ¨ arctanp1{2q« 53˝. a a

This excess of information will later form the basis for noise correction on im- perfect sensors. This is necessary because, with errors from GPS, compass, and inaccurate parameter estimation from the visual dimensions, we do not obtain a sin- gle point of intersection across all curves. While increasing the number of camera views will help, it will also increase the number of curves (each with some error). Thus, ultimately, we are left with many points of intersection, many of which can be far away from the true object position. To find a single point of convergence, we will rely on optimization techniques, finding the most-likely true object point by minimizing estimates of sensor error. Next, we describe the OPS system design, focusing mainly on how advanced computer vision techniques can be applied to implement visual trilateration and triangulation. In particular, vision will quantify relative distance and invert the effects of parallax to find the interior angle between a pair of photographs, which will guide the other sensors to ultimately yield object location.

180 OPS

Compass Visual Visual Triangulation Trilateration Triangulation

Magnetometer Structure-from-Motion Noise GPS Noise Optimization Techniques Noise Correction

Figure 6.7: OPS builds on triangulation and trilateration, each underpinned by computer vision techniques, and multi-modal sensor information. The noise from sensors affects the different techniques, and makes merging difficult.

6.4 OPS: System Design

In an ideal world, visual information should not be necessary – noise-free sensors should be able to triangulate the object position. Since real-world sensors are noisy, OPS uses visual information to combat the impact. However, visual information relies partly on sensors, and thus, the overall system needs to be optimized jointly, to marginalize the noise. For ease of explanation, we first describe the individual techniques in isolation (i.e., without considering the effect of noise). Then, we explain how noise forced many of our designs to fail, motivating our ultimate methods of “mismatch optimization.” Figure 6.7 captures this flow of operations.

6.4.1 Extracting a Visual Model

We begin with a discussion of the kind of information deducible from multiple pho- tographs of the same object. Figure 6.8 shows how two observers each experience a different perspective transformation of the same object – a building. Note that the apparent size of the same building is different in each, as is the shape. The differences in scale2 can be used to determine relative distances to the building. The

2 After accounting for camera focal length and lens distortions.

181

and is prone to inconsistencies across multiple photos. However, with a large number of keypoints per image, there is likely to be a substantial number of keypoints that derive from the same physical point in all photos. The keypoints from each photo are updated to a server, which then executes a keypoint matching algorithm. The matching process entails comparison of feature descriptors associated with each keypoint. A feature descriptor can be thought of as a unique “fingerprint” of a photograph, taken from the pixels around the keypoint. If a pair of feature descriptors are a strong match (numerically), the corresponding pair of keypoints can be assumed to likely capture the same physical point in the real world. Once keypoints are linked across multiple photos, SfM now prepares to analyze the spatial relationship between the locations at which the photographs were taken. For spatial reasoning, SfM applies algorithms that bear similarity to stereo vision. Perspective differences from multiple views of the same object (arising from paral- lax) can be used to reconstruct depth information. However, unlike stereo vision, SfM does not require a (known) fixed distance and relative orientation between a pair of views. Instead, SfM takes multiple sets of matched keypoints and attempts to reconstruct (1) a sparse 3D point cloud of the geometry captured by those key- points, and (2) the relative positions and orientation of the camera when the original photographs were taken, known as pose. Figure 6.9 shows an example point-cloud for a building – observe that the points in the cloud are located on the surface of the building and other visible objects in the environment, as well as at the location of the camera. SfM relies on Bundle Adjustment to perform a simultaneous refinement on the estimated 3D point cloud and parameters for each camera view (including, camera pose and lens distortions). Popular implementations of Bundle Adjustment use the Levenberg-Marquardt algorithm to perform a numerical nonlinear optimization on

183 Figure 6.9: Example of a 3D point cloud overlaid on one of the images from which it was created. reprojection error between what is seen in each image (as described by the keypoints) and what is predicted by different parameterizations of camera pose and 3D geometry. Thus, in summary, the final output from SfM is a considerably-accurate 3D point cloud.

From 3D Point-Cloud to Physical Location

For OPS, we utilize SfM as a “black box” utility. As input, SfM takes the matched keypoints of the user’s images. As output OPS receives a 3D point cloud of estimated ¡X,Y, Z¿ coordinates for each keypoint that was successfully matched across a suffi- cient number of images. We also have estimated ¡X,Y, Z¿ camera pose coordinates from where each photo was originally taken. This coordinate system, however, even if completely precise, exists at an unknown relative scaling, translation, roll, pitch, and tilt from the corresponding locations in the real-world. To compute the physical location of the object, the camera locations and orientations in the 3D model needs to be “aligned” with the GPS and compass readings from the smartphone. How- ever, since the GPS/compass readings themselves will be noisy, this alignment will

184 be non-trivial – the GPS/compass values will need to be adjusted to minimize the mismatch. Moving forward, OPS will focus on addressing these challenges.

6.4.2 Questions

Before we continue further into the challenges of mismatch minimization, we briefly discuss a few natural issues related to the system build-up.

(1) Capturing User Intent

The use of computer vision entails a second practical challenge – many objects may appear in the camera view. OPS must be able to infer, automatically, which object in view the user is most-likely interested in localizing. For example, a building may be partially occluded by trees. Thus, the point cloud may contain many keypoints that are not reflective of the specific object-of-interest. In general, we assume that the user positions the object-of-interest roughly at the center of the camera’s viewfinder. Across multiple photographs, the intended object will become a “visual pivot.” Near- foreground and distant-background points appear to shift away from this central point, due to parallax. More sophisticated techniques based on the computer vision techniques of segmentation are also relevant here. For example, in Section 6.5, we will consider an alternative approach for cases where we can assume the user is focused on a building. In our evaluation, however, we will avoid such assumptions.

(2) Privacy

We note that while OPS may offload computational tasks to a central server/cloud, users need not ever upload actual photographs of objects-of-interest (in fact, they can be discarded from the smartphone as well). Instead, they can only upload the keypoints and feature descriptors, that contain all the information needed by SfM. This serves to address any privacy concerns that a user may have with OPS.

185 6.4.3 Point Cloud to Location: Failed Attempts

In our original design, we expected that once a Structure-from-Motion point cloud is extracted, estimation of the real-world coordinate of the object would be reasonably straightforward. This would only require mapping vision ¡X,Y, Z¿ coordinates to real-world ¡latitude, longitude, altitude¿. In practice, substantial sensor noise makes this mapping more difficult and error-prone than we had anticipated. The point cloud reflects the structure of the object location relative to the user’s locations, when the original photographs were taken, but at an unknown relative scal- ing, translation, roll, pitch, and tilt from the real-world. Importantly, the locations at which the photographs have been taken are known in both real-world coordinates (through GPS) as well as in the SfM-derived coordinate system. In principle, some affine transformation should exist to convert from one coordinate system to the other. We sought to apply state-of-the-art computer vision optimization techniques to estimate this affine transformation, and we describe three of our failed attempts, followed by the actual proposal.

Attempts using Point Cloud Registration

We applied the computer vision technique of Iterative Closest Point (ICP) to find the mapping. ICP is commonly used in the registration of one point cloud to an- other. Before applying ICP, we first eliminated what is typically a key challenge; we pre-defined the association of points from one point cloud to the other. We also eliminated as many degrees-of-freedom in the transformation as possible: we nor- malized the ¡X,Y, Z¿ coordinates to eliminate translation from the search space and constrained scaling to be uniform in all dimensions. We attempted three mechanisms for estimation of the affine transformation: first, an approach based on the Singu- lar Value Decomposition (SVD); next, a nonlinear minimization of transformation error based on the Levenberg-Marquardt algorithm; third, we exploited knowledge

186 of surface normals of camera pose (the 3D direction at which the camera points) in the SfM point cloud and attempted to match with a 3D phone rotation matrix from compass and accelerometer (to find orientation relative to the vector of gravity). In all cases, sensor noise (in GPS, compass, and accelerometer) resulted in a nonsen- sical transformation. With only a few camera locations to connect one coordinate system to the other, and a still-large number of remaining degrees-of-freedom in the transformation, there is simply insufficient data to overcome sensor noise.

Attempts Intersecting Triangulation/Trilateration

After many unsuccessful attempts at applying computer vision techniques to esti- mate the affine transformation between coordinate systems, we attempted to simplify the localization problem. We tried to directly apply our intuitions for (1) compass triangulation; (2) visual trilateration; and (3) visual triangulation. The parameters of relative distance and interior angles can be trivially estimated from a SfM point cloud. If all sensors and vision techniques were fully-precise, the true pa, bq object lo- cation should fall at the intersection of those equations. In practice, after numerically solving the roots of all equation pairs, sensor noise and bias create many intersection points. We applied a 2D hierarchical clustering on these intersection points, hoping that one of the clusters would be distinctly dense, and the centroid of that cluster would be the estimated object location. In many cases, this proved correct. However, in more cases, sensor error (especially GPS) was large. Intersection points became diffused, and no longer indicative of the true location. To be practical, OPS would need to apply a more robust approach.

Attempts Optimizing Across Error Sources

We were encouraged by the partial-success of our second approach, directly applying our equations of triangulation and trilateration. Next, we attempted to push this

187 technique further, this time integrating an optimization approach. We formulated a minimization problem on the error terms for each sensor value and vision-derived parameter, and with the optimization constrained to find a single object location pa, bq. This led to a complex nonlinear optimization with many error terms and constraints. While this would occasionally converge to the correct object location, more often it found a trivial and nonsensical solution. For example, GPS error terms would “correct” all GPS locations to the same point. Further, the complexity of the optimization led to an impractically-long running time.

6.4.4 The Converged Design of OPS

From our early attempts to build a real-world object location model, we learned two important lessons that influenced the final design of OPS. First, any optimization would need to be constrained to be limited in the number of degrees-of-freedom and avoid degenerate cases. Second, we would need a mechanism to reduce the impact of GPS error. The final design of OPS consists of two optimization steps, each designed to limit the potential for degenerate solutions. Before continuing, it is helpful the simplify our notion of location. We do not consider latitude and longitude directly, as angular values are inconvenient for mea- suring distances. Instead, we apply a precise Mercator projection to the square UTM coordinate system. Therefore, we can now refer to a latitude/longitude position as a simple 2D px,yq coordinate. Recovery of the final ¡latitude, longitude¿ coordinate of the object is a straightforward inversion of this projection.

“Triangulation” via Minimization on Compass Error

Before explaining the optimizations underlying OPS, it is instructive to consider a reasonable baseline comparison. Since we assume that the user will take more than two photographs when using OPS, it would be unfair to compare OPS to

188 x y x y a triangulation with only two GPS readings pG1,G1q, pG2,G2q and two compass bearings θ1,θ2. Instead, we must generalize the notion of triangulation to support as many measurements as will be available for OPS. In Table 6.1, we present a nonlinear optimization that represents a triangulation- like approach to object localization. Unlike standard triangulation, this scales to

x y support an arbitrary number of GPS pGi ,Gi q and compass heading θi pairs. In noise-free conditions, all lines of compass bearing originating at the corresponding GPS point would converge to a single point pa, bq. Of course, due to sensor error, we

n n can expect that all pairs 2 of compass lines will result in 2 different intersection ` ˘ ` ˘ points. The optimization that follows seeks to find the most-likely single point of intersection by rotating each compass bearing as little as possible until all converge at the same fixed point pa, bq. We experimented with a number of other approaches for “generalized triangulation.” For one, we considered joint optimizations on GPS

n and compass. For another, we considered the 2D median of all 2 intersections ` ˘ points. After experimentation, we believe this is the most-effective technique, and thus the fairest baseline comparison method to OPS. Now, we turn our attention to the two-step process employed in OPS. First, we apply the output of computer vision to correct for noise in GPS measurements. Second, we extend this baseline sensor-only optimization for triangulation to (1) use our corrected GPS points; and (2) exploit our intuitions for visual trilateration and triangulation.

Minimization of GPS Noise, Relative to Vision

From our earlier attempts, we realized that a direct application of our intuitions for visual trilateration and triangulation would be insufficient. Instead, we need a mechanism to reduce sensor noise before a object-positioning step can be effectively applied. Here, we rely on the output of structure from motion to correct for random

189 GPS noise. Bias across all GPS measurements will remain. However, a consistent bias will be less damaging to the final localization result than imprecision in the relative GPS positions across multiple photographs. Structure from motion can help eliminate this noise between relative positions, as it tends to capture this relative structure with far greater precision than GPS. We design a nonlinear programing optimization that seeks to move the GPS points as little as possible, such that they match the corresponding relative structure known from vision. We design an optimization that maps the original GPS points where pho-

x y x y tographs were taken t@i : pGi ,Gi qu to a set of fixed GPS points t@i : pFi ,Fi q “

x x y y pGi ` Ei ,Gi ` Ei qu. The optimization will also solve a scaling factor λ that proportionally shrinks or expands the point-to-point distances in the structure- from-motion point cloud to match the equivalent real-world distances measured in meters. The constraints simply enforce that the distance between any pair of

x x 2 y y 2 GPS points, pGi ´ Gj q ` pGi ´ Gj q , is equal to the distance between those b h h 2 d d 2 same points in vision coordinates, pVi ´ Vj q ` pVi ´ Vj q , after multiplying the b vision distance by a constant factor λ.3 Since we expect the GPS points to have some noise relative to the same points in vision, we introduce error terms for the GPS

x x 2 y y 2 distance, pEi ´ Ej q ` pEi ´ Ej q . With these error terms, the optimization is b simply a minimization on the sum of squared error. Table 6.2 presents the complete optimization.

OPS Optimization on Object Location

From the GPS-correction optimization (Table 6.2), we are left with a set of fixed

x y GPS points, t@i : pFi ,Fi qu, and a scaling factor λ from vision to GPS coordinates. 3 To avoid confusion with GPS x and y dimensions, we use h and d to represent the relevant two dimensions of the vision coordinate system. The h, or horizontal, dimension runs left/right of the object from the perspective of the user. The d, or depth, dimension runs towards/away from the object.

190 Now, we take these parameters to extend the baseline sensor-only optimization for triangulation (Table 6.1), along with additional context from visual trilateration and triangulation. We present this final optimization as Table 6.3. The additional context for visual trilateration and triangulation is encoded as

parameters t@i, j : γiju, pCx,Cyq, and D. Each value γij represents the angle from

x y x y pFi ,Fi q to pa, bq to pFj ,Fj q, estimated from vision. As represented in the notation, this is the positive acute angle between vectors Vi and Vj. Thus, t@i, j : γiju directly encodes our original interpretation of visual triangulation. To avoid redundancies in the constraints (since triangulation and trilateration parameters are directly mea- sured from the same vision point cloud), we only need to partially encode visual tri- lateration. Instead of encoding each of the relative distance from each camera point, we can simply enforce that the distance from the user’s position to the object as a

x y known, fixed value. We compute pCx,Cyq as the 2D median across t@i : pFi ,Fi qu, by applying convex hull peeling. Next we enforce that the distance from pCx,Cyq

2 2 to the object at pa, bq, pa ´ Cxq ` pb ´ Cyq , is equal to the distance D. We can a compute D from the vision point cloud along with the vision-to-GPS scaling factor λ. The minimization function must change to accommodate context from triangula- tion (visual trilateration is fully incorporated as hard constraints). The addition of

γ γij error terms t@i, j : Eiju allow angular error to be traded between magnetometer- derived compass headings and vision-derived angles. The compass error scaling fac- tor, pn ´ 1q{2, balances for the lessor quantity of compass error terms relative to pairwise vision angles.

191 6.5 Discussion

6.5.1 Extending the Location Model to 3D

In Section 6.4.4, we described how OPS estimates the object-of-interest location in two dimensions, namely as the point pa, bq. In some contexts, the 3D loca- tion of the object can also be useful. For example, we might want to localize a particular window of a multistory building. Ultimately, OPS should provide a ă latitude, longitude, altitude ą location tuple. However, the height dimension adds additional challenges not faced on the ground, following a plane tangential to the Earth’s surface. First, GPS-estimated altitude is prone to greater inaccuracy than latitude and longitude. Second, while it is natural for a user to take multiple photos by walking a few steps in-between, a requirement to take photographs at multiple heights would become awkward. Thus, while vision provides three-dimensional ge- ometry, GPS locations for photographs are roughly planar. Further, since the object localization task is already challenging in two dimensions, it is desirable to avoid integrating 3D into our location optimizations. Instead, OPS finds the two-dimensional object location first. Next, it uses the now-known distance to the object, along with accelerometer and vision inputs, to estimate height. For each photograph, OPS records the raw three-axis accelerometer output. Since we can expect the phone to be roughly still while the photograph is taken, the accelerometer is anticipated to measure only gravitational force. This gravitational vector defines a unique orientation in terms of phone roll (rotational movement on the plane of the phone screen, relative to the ground) and pitch (rota- tional movement orthogonal to the plane of the phone screen, relative to the horizon). Pitch provides a rough estimate of how much higher (or lower) the user is focusing, relative to a plane parallel to the ground and intersecting the user at eye-level. To improve the accuracy of this measurement, we can “average” pitch measurements

192 from every photograph. Importantly, the user might not align the object in every photo at exactly the same pitch. For example, the window might appear higher or lower on the screen. We can correct for this by leveraging vision once again. From our 3D point cloud, there is a unique mapping of every 3D point back to each original 2D image. We can now compute an adjustment value, measured in pixels, from the horizontal center line of the screen. We can convert this pixel value to an angle, given the known camera field-of-view. Next, the angular sum of pitch and adjustment, av- eraged across all photographs, can be used to estimate height when combined with the known two-dimensional distance.

2D distance hobject “ hobserver ` pitch`adjustment 2 ¨ tan 2 ` ˘ 6.5.2 Alternatives for Capturing User Intent

If we can make assumptions regarding the structure of the object-of-interest, com- puter vision techniques of segmentation can assist in isolation of the object from the structure-from-motion point cloud. For example, consider a typical multistory building with large flat sides. Images of an exterior wall will tend to yield many key- points along a flat plane, roughly perpendicular to the ground plane. These points on the wall plane are often clearly distinct from points on the ground, smaller clusters of points in the nearest-foreground from occlusions, or sparse points in the distant background. To determine where the object-of-interest lies within a point cloud, we attempt to segment the point cloud and find such a predominate plane. We apply Random Sample Consensus (RANSAC) [63], an iterative method to estimate a model for a plane, under an angular constraint that it must be roughly-perpendicular to the ground and parallel to the field-of-view. All points in the point cloud are then classified as either inliers or outliers to the plane. Next, we find the spatial centroid among inliers to the plane. This point is considered to be the object-of-interest.

193 6.6 Evaluation

We take a systems-oriented approach in evaluating OPS, so as to capture real-world performance. Phone sensors are subject to noise and biases. GPS can be impacted by weather (due to atmospheric delay), clock errors, errors in estimated satellite ephemeris, multipath, and internal noise sources from receiver hardware. Compass magnetometers are affected by variations in the Earth’s magnetic field and nearby ferromagnetic material. Computer vision techniques, such as structure from motion, can break down in a variety of scenarios. For example, keypoint extraction may fail if photographs have insufficient overlap, are blurred, are under or over-exposed, or are taken with too dark or bright conditions (such as when the sun is in the user’s eyes). The primary goal of our evaluation is to consider how well OPS overcomes this naturally-challenging operational context.

6.6.1 Implementation

OPS is implemented in two parts, an OPS smartphone client and a back-end server application. We built and tested the OPS client on the Google NexusS phone, as a Java extension to the standard Android 2.4 camera program. Photographs may be pre-processed locally on the phone to extract keypoints and feature descriptors (reducing the required data transfer), or simply uploaded to our server for processing (faster with a high-performance WiFi connection). Along with photographs (or key- points and descriptors), the phone uploads all available sensor data from when each photograph was taken, to include GPS, compass, and accelerometer. Our server is a Lenovo desktop running Ubuntu Linux 11.04.

Computer Vision

Both the client and server applications support the basic computer vision tasks of keypoint detection and extraction of feature descriptors. We use the SURF (Speeded

194 Up Robust Feature) algorithm [29]. We choose SURF over the related SIFT (Scale Invariant Feature Transform) [112] as it is known to be considerably faster to compute and provide a greater robustness against image transformations. SURF detection and extraction are performed using OpenCV. For both the client and server-side applications, JavaCV provides java wrappers of native C++ calls into OpenCV. For server-side structure from motion, we use Bundler, an open-source project written by Noah Snavely and the basis for the Microsoft Photosynth project [172]. Bundler operates on an unordered set of images to incrementally-build a 3D recon- struction of camera pose and scene geometry. As input, Bundler expects keypoints matched across multiple photos (on the bases of the corresponding feature descrip- tors). Bundler can operate on any keypoint type, expecting SIFT by default. We adapt the output of the OpenCV SURF detector to match the SIFT-based input ex- pected by Bundler, substantially decreasing processing time per photo on the phone. As output, Bundler provides a sparse point cloud representation of the scene in view. OPS builds its real-world localization model on top of this point cloud, which exists at an unknown relative scaling, translation, roll, pitch, and tilt from the corresponding locations in the real-world.

Server-side Nonlinear Optimization

The OPS server is implemented primarily in Java. Mathematica is used through a Java API for solving nonlinear optimizations. We use differential evolution as the optimization metaheuristic for its strong robustness to local minima (simulated an- nealing proved to be similarly effective, with both outperforming the default Nelder- Mead) [178].

195 6.6.2 Accuracy of Object Localization

We tested OPS at more than 50 locations on or near the Duke University campus. We attempted to use OPS in the most natural way possible, focusing on localization tests that mirror how we would expect a real user would want to use the system. Primarily, we considered objects at distances between 30m and 150m away, for two reasons. First, the object should be far enough away that it makes sense to use the system, despite the hassle of taking photographs. Though it only takes about a minute to take the required photographs, a user should not be more willing to simply walk over to the object to get a GPS lock. Second, distances are limited by the user’s ability to clearly see the object and focus a photograph. Building densities, building heights, and presence of significant occlusions (such as trees), constrain the distances at which photographs can be easily taken. We compare OPS accuracy to “Optimization (Opt.) Triangulation.” To provide the fairest comparison, Triangulation reflects the triangulation-like optimization de- scribed in Section 6.4.4, designed to scale to an arbitrary number of GPS, compass- heading pairs. For some graphs, we also show the distance from the position at which photographs were taken (centroid across all photographs) to the true object location. This is an important consideration; as sensor values are projected into the distance, noise and bias can be magnified.

Example Usage Scenarios

In Figure 6.10, we show three example photos, taken while using OPS. Below each photo, we show a screenshot taken from Google Earth with four annotations, (1) the location at which the user photographed the object-of-interest; (2) the true location of the intended object (positioned at the center of the screen in each photo; (3) the object position inferred by Opt. Triangulation; and (4) the object position inferred by OPS. We show these particular examples to highlight that OPS is a general approach,

196

1

0.8

0.6 OPS Opt. Triangulation CDF 0.4 Object Distance

0.2

0 0 30 60 90 120 Absolute Error in meters

Figure 6.11: CDF of error across all locations. Graph reflects four photos taken per location. 50 locations.

160 OPS 140 Opt. Triangulation 120 Object Distance 100 80 60

Error in meters 40 20 0 0 5 10 15 20 25 30 35 40 45 Buildings

Figure 6.12: OPS and triangulation error at 50 locations. Graph reflects four photos taken per location.

Sensitivity to GPS and Compass Error

To better understand OPS’s robustness to sensor noise, we wanted to carefully eval- uate the impact of GPS and compass noise, in isolation. We took a set of four photographs of a Duke University Science Center building, from a mean distance of 87m. We ensured that this particular set of photographs provided a robust model from vision, through a manual inspection of the point cloud. For each photograph, we used Google Earth to mark the exact location at which it was taken, with less than one meter error. From these carefully-marked points, we mathematically com-

198

6.7.1 Live Feedback to Improve Photograph Quality

OPS is impacted by the quality of user photographs. Poor angular separation be- tween two photographs (too small or large) can reduce the efficacy of structure from motion. We imagine a system of continuous feedback to the user, aiding in shot selection and framing. By combining an especially-lightweight keypoint detection heuristic, such as FAST [158], on the camera video with information from the phone gyroscope, it would be feasible to suggest when a “good” photograph can be taken. Otherwise, we would inform the user to take a corrective step (to the left or right).

6.7.2 Improving GPS Precision with Dead Reckoning

From Figure 6.13 and 6.14, it is clear that OPS is already relatively insensitive to GPS and magnetometer noise. Although OPS explicitly uses the output of structure from motion to cancel GPS noise, large GPS errors can still become disruptive – this is quite often the cause of poor performance. One possibility is to apply additional sensors to improve GPS precision. In particular, a constant bias is less damaging than imprecise relative positions across camera locations. We are considering mech- anisms to leverage gyroscope and accelerometer for dead reckoning between camera locations.

6.7.3 Continual Estimation of Relative Positions with Video

Continuous video could potentially be used to substantially augment or even re- place the structure-from-motion point cloud. A frame-by-frame comparison of these keypoints, in fine-grained comparison with accelerometer, gyroscope, compass, and GPS, could provide a highly-detailed trace of how the smartphone (1) moved in 3D space and (2) viewed the object-of-interest in terms of relative angles/distances.

201 6.8 Related Work

To the best of our knowledge, OPS is a first-of-kind system to find the precise position for objects of significant distance away, despite noisy sensors and without requiring a database of pre-existing photographs or maps. Nonetheless, there is a substantial body of related work, especially leveraging computer vision for recognition of well- known landmarks.

6.8.1 Localization through Large-Scale Visual Clustering

OPS is related to the problem of worldwide localization on the basis of visual clas- sification [190, 105, 79, 52]. Most closely related to OPS, [190] is the back-end system underlying Google Goggles: (1) online travel guides are mined for possible tourist landmarks; (2) photo-sharing databases of millions of GPS-tagged photos are searched using keywords from these travel guides; (3) unsupervised visual clus- tering on these search results provide a visual model of the landmark. From 20 million GPS-tagged photos, 5312 landmarks can be recognized by visual comparison of a photograph with these landmark models. OPS is designed to be more generic, able to localize objects which cannot be considered landmarks (without a pre-exiting photo library of these objects).

6.8.2 Aligning Structure from Motion to the Real World

Substantial computer vision literature has considered improvements and applications of structure from motion (SfM) [172]. However, our notion of “object positioning” should not be confused with the computer vision concept of object localization, which seeks to find the relative location of objects within an image or point cloud. For mobile systems, closely related to OPS, [81] uses SfM for a “landmark-based” directional navigation system. SfM, along with a preexisting database containing many photographs of the area in view, enable a augmented reality view with high-

202 lighted annotations for known landmarks. More-directly related to the goals of OPS, [92] seeks to align the output of structure from motion to real-world coordinates, using GPS coordinates to improve the scalability of structure from motion when us- ing hundreds of photos. However, the techniques require the availability of overhead maps and are not suitable to the extremely small number of photos (typically four) expected for OPS.

6.8.3 Building Object Inventories

Our use of multimodal sensing inputs for estimating object position is related to the techniques in [42] for building inventories of books in a library. The authors project a rough indoor location for a book through WiFi and compass, then apply vision techniques to visually detect book spines.

6.8.4 Applying Computer Vision to Infer Context

CrowdSearch [187] combines human validation with location-specific image search (for objects such as buildings) through computer vision. By leveraging Amazon Mechanical Turk for human-in-the-loop operation, CrowdSearch enables precision in cases of poor image quality. TagSense [148] infers “tags” of human context for photographs, by combining computer vision with multimodal sensing. OPS is com- plementary to CrowdSearch and TagSense, by enabling awareness of precise location for objects in photographs.

6.9 Conclusion

Today’s augmented reality applications are less useful than their potential. We be- lieve this is due, at least in part, to an unfortunate usage asymmetry. Users can retrieve available localized content in a natural way, viewing pop-up annotations for the world through their smartphone, but there is no means to equivalently in-

203 troduce new annotations of objects in the vicinity. By providing localization for objects in a user’s view, this chapter seeks to enable convenient content creation for augmented reality. Beyond augmented reality, we believe that a precise Object Po- sitioning System can be widely enabling for a variety of applications, and is worthy of a significant research endeavor. In this light, we believe that our approach, OPS, takes a substantial first step.

6.10 Reference Equations

For completeness, we provide equations for curves of visual trilateration and trian-

gulation. We assume at least two GPS points are known px1,y1q, px2,y2q. The object position is inferred at pa, bq.

6.10.1 Equation of Visual Trilateration

Let σ “ d2{d1 be the ratio of distance from px2,y2q to pa, bq divided by the distance

px1,y1q to pa, bq.

2 2 2 2 2 pa ´ x2q ` pb ´ y2q “ σ pa ´ x1q ` pb ´ y1q “ ‰

To derive, construct two right triangles with hypotenuses px1,y1q to pa, bq and px2,y2q to pa, bq. Apply the Pythagorean theorem on each, substituting the first into the second.

204 6.10.2 Equation of Visual Triangulation

Let γ represent a measure of the interior angle between a vector V1 from px1,y1q to pa, bq and V2 from px2,y2q to pa, bq.

V1 “ pa ´ x1qˆi ` pb ´ y1qˆj V2 “ pa ´ x2qˆi ` pb ´ y2qˆj

γ “ arccos V1 ¨ V2{|V1||V2| ´

Then, the equation of visual triangulation is constructed as:

2 2 px2 ´ x1q ` py2 ´ y1q “ “ ‰ 2 2 2 2 pa ´ x1q ` pb ´ y1q ` pa ´ x2q ` pb ´ y2q ´ “ ‰ “ ‰ 2 2 2 2 2 pa ´ x1q ` pb ´ y1q pa ´ x2q ` pb ´ y2q cos γ a a To derive, apply the law of cosines (C2 “ A2 ` B2 ´ 2AB cos γ) with C taken as

2 2 the side px2 ´ x1q ` py2 ´ y1q . a

205 θ Minimize |Ei | ÿ@i Subject to y x θ @i : b ´ Gi “ pa ´ Gi q ¨ cotpθi ` Ei q Solving for a, b θ @i : Ei With parameters x y @i : Gi ,Gi ,θi

Name Parameter Sources x y Gi ,Gi GPS position (of user at each photograph) θi Compass bearing (at each photograph) Name Solved Variable Interpretation a, b Estimated object location at pa, bq θ Ei Estimated error for compass bearing θi Table 6.1: Optimization for Triangulation

206 x2 y2 Minimize Ei ` Ei ÿ@i ´ Subject to x x x x 2 @i, j : pGi ` Ei q ´ pGj ` Ej q ` “ y y y y ‰2 pGi ` Ei q ´ pGj ` Ej q “ “ 2 h h 2 d ‰ d 2 λ ¨ pVi ´ Vj q ` pVi ´ Vj q Solving for σ “ ‰ x y @i : Ei ,Ei With parameters x y h d @i : Gi ,Gi ,Vi ,Vi

Name Parameter Sources x y Gi ,Gi GPS position (of user at each photograph) h d Vi ,Vi Vision position (arbitrary units) Name Solved Variable Interpretation λ Scaling factor, vision coordinates to GPS x y Ei ,Ei Estimated camera GPS error Table 6.2: OPS Optimization on GPS Error

207 n ´ 1 Minimize ¨ |Eθ|` |Eγ | 2 i ij ÿ@i @ÿi,j 2 2 2 Subject to pa ´ Cxq ` pb ´ Cyq “ D y x θ @i : b ´ Fi “ pa ´ Fi q ¨ cotpθi ` Ei q x ˆ y ˆ @i : Vi “ pa ´ Fi qi ` pb ´ Fi qj γ @i, j : γij ` Eij “

arccos Vi ¨ Vj{|Vi||Vj| ´ Solving for a, b θ @i : Ei γ @i, j : Eij With parameters Cx,Cy,D x y @i : Fi ,Fi ,θi @i, j : γij

Name Parameter Sources x y Fi ,Fi Fixed GPS position (at each photograph) θi Compass bearing (at each photograph) γij Vision estimate for vector angle =ViVj x y Cx,Cy 2D Median of t@i : pFi ,Fi qu D Estimated distance from pCx,Cyq to pa, bq Name Intermediate Results x y Vi Vector from pFi ,Fi q to pa, bq Name Solved Variable Interpretation a, b Estimated object location at pa, bq θ Ei Estimated error for compass bearing θi γ EijTable 6.3:Estimated OPS Final error Object in vector Localization angle =ViVj

208 7 Predicting Client Dwell Time in WiFi Hotspots

Modern smartphones provide a rich set of sensors, enabling high-resolution measure- ment of the user’s behavior. Such behavioral insights are beginning to influence the design of (personal) networking systems. While the space of behavior-aware net- working is broad, this paper focuses on the problem of predicting how long a user will remain at a WiFi hotspot, called dwell time. We find that suitably mining sensor data from smartphones can enable dwell time prediction in real time. We believe that dwell time can be a useful primitive for a number of applications, including traffic prioritization, mobile gaming, and targeted advertising. Towards this goal, we propose a general framework for dwell time prediction, and present its effectiveness through a 3G offloading application. Promising results from live and trace-based experiments serve as a motivation for longer-term engagement in this new research direction.

7.1 Introduction

The synergy of sensing, computing, and communication on modern smartphones is enabling high resolution insights into human behavior. Recent research has at-

209 tempted to leverage these insights for improved personal activity recognition, ranging from simple activities such as walking, running, laughing [128] to more sophisticated ones like, whether the user is a driver or a passenger in a car [188], or if the user is in a social gathering [27]. In this work, we intend to add dwell time to this library of detectible activities, where “dwell time” is defined as the duration for which a user will stay within a WiFi hotspot. Unlike some of the existing activities, dwell time is predictive in nature, perhaps making the problem more challenging. However, if feasible, the knowledge of how long a user will stay at a hotspot can enable new tech- nologies and applications. For instance, (1) knowing how soon a client will leave a WiFi hotspot may help the AP prioritize traffic to it – perhaps a movie can get fully downloaded before the user gets disconnected; (2) the gaming industry is concerned about the impact of mobile users in a multiplayer game – if some user leaves WiFi and enters a high latency 3G network, the gaming experience of the entire group can degrade sharply. Predicting a user’s dwell time may offer timely hints for the game to adapt; (3) advertisements in shops and malls can be adapted to the user’s dwell time – a user that shows signs of leaving early can perhaps be incentivized to stay with a coupon. Thoroughly exploring the space of applications is outside the scope of this work. This paper focuses mainly on the viability of the underlying primitive, i.e., real-time, configuration-free, dwell time prediction. However, towards completeness, we build one example application that exploits dwell time prediction to offload 3G networks. We call the proposed dwell time prediction engine, ToGo. With ToGo, mobile devices periodically report their sensor readings to the AP. While data-processing techniques of varied complexity are possible, in our implementation the AP runs a machine learning algorithm that accepts the sensor readings as features of user- behavior, and predicts the user’s dwell time. The key intuition is that correlation of sensor readings among similarly-behaving individuals can offer rich prediction

210 opportunities. As a trivial example, a WiFi AP running ToGo may gather sensor readings (e.g., compass, accelerometer, WiFi signal strength) from a person coming down an escalator in an airport. By looking into past dwell times of all people with matching sensor signatures (i.e., all people who came down the same escalator), it might be feasible to predict that this new person will soon exit the terminal. ToGo learns such signatures using the initial set of users as the training set; there is no need for any hotspot-specific configuration. We evaluate ToGo through live experiments at a university cafe – real users are requested to carry the ToGo-installed Android phones when they enter the cafe, and the phones predict their dwell times. For larger-scale experiments, we carefully record real user behavior at the university library, cafe, and McDonald’s, and later enact them at the corresponding locations. Although our machine learning-based approach is reflective of a first attempt in this nascent space, evaluation results show reasonable success in predicting client dwell duration for real human activities. Of course, the prediction is not accurate to the absolute duration – ToGo predicts the range in which the user’s dwell time would fall. In the specific application we develop, we picked 5 ranges based on actual user behavior. We fully concede that ToGo is a prototype, and not yet ready for deployment. It requires further large scale testing and tuning, particularly across a wide range of hotspots, traffic patterns, and human users. We also need to address questions pertaining to energy, user misbehavior, and device heterogeneity. Nevertheless, we believe that results in this paper are adequately promising to justify the longer- term research engagement. The promise is particularly pronounced because ToGo performed seamlessly for a completely uncontrolled experiment with live users. With improved machine learning and activity recognition algorithms, ToGo may become an important step towards micro-behavior aware service delivery.

211 Our main contributions can be summarized as follows. (1) We design ToGo, a framework for predicting length-of-stay at WiFi hotspots. ToGo leverages machine learning and automatic self-training for live dwell predictions without any hotspot-specific configuration. (2) We implement ToGo on Google NexusOne phones and on a laptop- based WiFi AP. Results with real patrons at public locations encourage our ap- proach. (3) We present BytesToGo, a case study application of ToGo. BytesToGo considers the opportunity of offloading 3G traffic through predictive prioritization. We show that mobile device sensors may reveal user intentions facilitating informed network decisions.

7.2 Natural Questions

Are dwell times truly diverse? Or do most users in a given location (e.g., coffee shop) dwell

for similar durations? To verify the diversity in dwell times, we visited a university cafe and set up a WiFi-enabled laptop as a traffic sniffer. We selected the three strongest APs in the cafe (all on channel 6), and used tcpdump to monitor the distinct devices connected to these APs. Dwell time for each device was estimated as the time-difference between the first and the last time the device was visible to the sniffer. In five hours of a weekday afternoon (11am to 3pm), we detected 340 distinct devices. Figure 7.1 shows the CDF of their dwell times. More than one third of the devices dwelled for less than 10 minutes (e.g., the user had a quick lunch/snack); and even among them, half of the devices stayed for 2 minutes or less (coffee/food to-go). More than one fifth stayed at least two hours. Clearly, user dwell-times exhibit diversity.

212 1

0.8

0.6

0.4

Empirical CDF Empirical 0.2

0 0 60 120 180 240 300 Dwell Duration (minutes)

Figure 7.1: Clients at a university cafe exhibit varied dwell times, reflecting mul- tiple patterns of user behavior. Some long-dwell clients study for hours while more mobile users take a meal to-go.

Predicting length-of-stay seems difficult and highly dependent on the user. How can ToGo operate effectively across a wide variety of users and contexts? We observe that the user’s decision of staying/leaving are often manifested in some of her early micro- activities. A user who intends to eat at a McDonalds may stop at the condiments section; a user in Wal-Mart may pick up a cart for substantial grocery shopping; a user intent on buying clothes may pause longer at shelves than someone window- shopping. We hypothesize that these micro-activities have a footprint on the mobile phone sensors, which in turn reveals the user’s intentions. ToGo uses the sum of all such micro-activity signatures to self-tune itself. Fortunately, the ground truth about the user’s exact length-of-stay is available upon the user’s departure—the AP knows the duration from client association to disassociation. By exploiting the knowledge of ground truth (i.e., automatically labeling sampled data), ToGo retrains itself and naturally adjust to the current trends in user behavior. Our measurements suggest

213 that observable trends indeed exist in a variety of typical deployment settings; and these trends can be recognized by straightforward machine learning mechanisms.

Will ToGo require modifications on the mobile device to obtain their sensor readings? While this is true, client side modifications can be avoided at the expense of perfor- mance. We show that the ToGo AP can utilize only the uplink RSSI measurements (that it learns automatically) for dwell time prediction. Of course, the prediction accuracy will be lower.

7.3 ToGo Prediction Engine

This section describes the core prediction functionalities in ToGo, built off standard machine learning tools. In the section that follows, a case study will serve as an illustrative example to make ToGo’s advantages more concrete.

7.3.1 Design Overview

ToGo clients on mobile devices export sensor readings to the AP. These sensor read- ings could be implicit, such as RSSI, or explicit, as with acceleration, compass di- rection, etc. The ToGo AP uses a support vector machine (SVM) to process these (multi-sensory) features and categorize the user’s dwell time. Over time, the features from the same class of behavior begin to exhibit similarity (e.g., people about to leave an airport terminal may all walk southwards and then climb a downward escalator). The SVM recognizes such similarities and employs them for prediction. Correct pre- dictions reinforce the similarity; incorrect predictions imply that the feature set may not be sufficiently discriminating. The SVM learns from the failures and refines the prediction over time. The overall framework is composed of two main modules: (1) The Sensor Measurement Module at the client systematically probes the phone sensors, extracts basic statistics, and periodically sends a summary to the

214 ToGo server running on the AP. These may be viewed as a timeslice of features, capturing an instant of a user’s behavior/micro-mobility. The summary is limited to a few bytes and sent over WiFi, incurring a small control overhead. This information can be piggybacked on other upload traffic. (2) The Dwell Time Prediction Module operates on the summary report to predict the dwell time for clients as they arrive. Machine learning techniques are employed to classify each user into one of a few groups, corresponding to a coarse notion of expected dwell time. Our implementation uses five dwell time-classes on a discretized logarithmic scale (1-5), with lower values indicating a shorter expected dwell. In examining user behavior at a campus McDonald’s, we found that broad types of user behavior cluster along this scale. For example, the dwell classes and corresponding behavior were often as follows: (1-2) walking past the restaurant, (2- 3) taking food to-go, (4) buying food and eating in the restaurant, (4-5) studying in the dining area. When a user walks in, her short-term mobility pattern may resemble the walk- past-the-cafe category, but as she stands in the queue near the counter, she may be moved to the take-out category. If she goes to pick up condiments, her pattern will re- semble the 4th (sit-down) category. When the user finally leaves, ToGo can learn the truth about her dwell time, and use this data point to refine future predictions. With many users visiting a hotspot, we anticipate reasonably quick convergence to the true behavioral categories in that location. Thereafter, the predictions can become accu- rate. The entire operation can be automatic, requiring no manual configuration or tuning.

7.3.2 Components

Next, we present the key functional components of the ToGo prediction engine. The flow of interactions between these components is illustrated in Figure 7.2.

215 Feature Extraction

We targeted the Google NexusOne as our client device for its variety of sensing capa- bilities, including a three-axis accelerometer, light sensor, electromagnetic compass, and WiFi/GSM radios. For each sensor, we constructed a variety of simple summary metrics on raw values (e.g., mean, standard deviation, histogram, etc.) and allowed the SVM to select the most-discriminating subsets automatically.

Short-term Predictions

The mobile device periodically generates the sensor-feature matrix (i.e., the set of features per sensor) and sends it to the AP. Each feature is computed over a moving time window to capture the short-term user behavior. A SVM classifier accepts the matrix, and based on training data from the past, predicts the user’s likely dwell- time class. Fig. 7.2 shows that for a user x at time t, the SVM sub-predictor yields a

x predicted dwell time class of pt . Of course, this prediction does not capture long-term behavior – a person going to the restroom in a cafe may be mispredicted as leaving the cafe. However, this short term predictor is useful to obtain quick predictions. Note that observing a user over the long term can yield high prediction accuracy; however, waiting too long may defeat the purpose of certain applications. Instead, ToGo starts making quick predictions as soon as the user enters the hotspot, and continues to refine its guess over time.

Sequence Prediction

The series of time indexed short-term predictions form an increasing sequence over

x x x x time, i.e., φptq = ăp1,p2,p3...pt ą, where t is the current time. This may be viewed as a growing signature, that incrementally reveals the nature of the user’s behavior. Of course, once a user leaves, the complete signature can be recorded, and her true dwell time learnt. During bootstrap, ToGo records these sequences and dwell times,

216 Prediction Sequence Predictor User x sequence of user x Class ...... Pred. seq. -- User N 1 Sensors SVM Sub- i 1 Time S1 S2 predictors pt-2x Class F1 Pred. seq. -- User N 2 p x i 2

F2 t-1 ... t p ... SVM t x ... pt+1x Features p t+2x Predicted dwell time class ...... at time t : Pt (x)

Figure 7.2: Periodic Sensor-Feature matrices feed the SVM sub-predictors to gener- ate short-term predictions. These time-indexed predictions form a growing sequence that are then used to predict the user’s long-term dwell time behavior. Sequences from other users are used as the training set. and trains itself with them – we call this the Sequence Predictor (Fig. 7.2). Clusters of sequences represent distinct classes of long-term behavior, characteristic of that hotspot. Now, as a user begins to dwell inside the hotspot, her partial sequence, φptq, is matched against the recorded sequences (perhaps reminiscent of genetic string matching). The resulting prediction begins to better reflect the user’s long-term behavior.

Coping with Time-varying Behavior

Human mobility patterns may be dependent on time of day (customer behavior may differ between breakfast, lunch, and dinner times). However, this does not pose a problem for ToGo as long as most customers exhibit similar behavior during a given time span. This is because “time” is also a feature in our system, and the SVM iden- tifies that distinct clusters can be created using time as the dominant discriminator. In summary, ToGo can automatically adapt to hotspot-specific behavior (Starbucks versus McDonald’s), as well as to variations in time (afternoons versus evenings).

217 7.4 BytesToGo: An Application of ToGo

ToGo provides an API through which a mobile app can query for the predicted dwell time. In this section, we use this API to build a 3G offloading application via traffic prioritization. The basic idea is to prioritize traffic to short dwell time users, and thereby, lessen the leftover demand that gets carried over to 3G. We motivate this application first, and then present its design and implementation.

7.4.1 Motivation

Mobile broadband traffic continues to increase at an overwhelming pace. New classes of content-browsing devices, such as iPads and Slates, will further heighten this strain [30]. These devices will not only download more video and large sized pic- tures/eBooks, they will also do so on-the-fly. A student may begin downloading a Netflix movie while walking from her lab to the campus bus stop. Projections of the future suggest that by 2013, 3G networks will become completely saturated when 40% of its subscribers consume video for just 8 minutes a day [40]. 4G, LTE, and White Space Networks combined, are not expected to absorb this load. Responding to this concern, AT&T has announced an additional $2 billion in- vestment to make sure it meets the growing demand for content consumption [154]. Part of this money will be invested in adding more WiFi access points (APs), and effectively using them to assist 3G networks [154]. Motivated by this call for research, we identify a possibility to actively offload 3G traffic. While offloading is not a new idea, it has mostly been investigated in the context of vehicular network access [24]. This application targets a complementary scenario, applicable to pedestrian users connecting to public WiFi APs. Our intuition is simple. Among clients connected to WiFi hotspots, those likely to crossover sooner to 3G may be treated with proportionally higher priority. Pri-

218 oritized traffic allows for greater data download to the user with short dwell time, reducing the burden that gets carried over to 3G. Thus, airport travelers arriving at the baggage claim area could be allocated a larger bandwidth share over those walking towards check-in counters. Similarly, an iPad user beginning to walk away from the Starbucks AP can be prioritized over a seated laptop user. The AP could recognize these patterns from the phones’ sensor readings, classify users into discrete dwell time categories, and prioritize them accordingly. Since per-user WiFi through- put is substantially higher than 3G (e.g., 3Mbps vs. 450kbps), a minute of WiFi prioritization can be valuable. We call our system BytesToGo (BTG), based on the observation that highly-mobile users download more bytes over WiFi before “going” into the 3G network.

7.4.2 Natural Questions

Just deploying a WiFi AP will substantially offload 3G networks – is BTG still necessary? We argue that the improvement from BTG should not be compared against the gains from WiFi deployments. BTG may be viewed as a software upgrade to make better use of WiFi APs, since they may anyway be installed in large numbers.

Earlier works have studied predictive offloading and handoff in cellular contexts [96, 14]

and more recently in 3G/WiFi domains [143, 24]. Is BTG different? To the best of our knowledge, prior research has broadly focused on macro level behavior/mobility patterns, profiling how users transition between cells, encounter WiFi APs, or habit- ually dwell in them. BTG may be viewed as an attempt to exploit micro-behavior, particularly via emerging opportunities in personal sensing and data mining. We believe that micro-behavior guided networking is relatively unexplored.

How much bandwidth can be offloaded from WiFi to 3G? The benefits from WiFi prior- itization are proportional to the throughput difference between WiFi and 3G. When

219 Figure 7.3: Difference between WiFi and 3G TCP throughput at different hotspots. WiFi offers almost 6.5ˆ throughput compared to 3G.

3G throughput is considerably less than WiFi, a small increase in WiFi utilization can save considerable channel time on 3G. To characterize this difference, we mea- sured WiFi bandwidth inside 8 different stores and 3 homes, and the corresponding 3G throughput just outside their coverage areas. Tests were conducted on different phones using a speedtest application from dslr.net. WiFi measurements were per- formed by walking through the hotspot, ensuring that transmission bitrates were not over-estimated. For 3G measurements, the user ran the speedtest app when located at the edge of the WiFi range, and walked away from the hotspot for 30s. Figure 7.3 reports an average of 6.64x higher TCP throughput over WiFi than with 3G, encouraging the prospects of predictive prioritization. BTG aims to benefit both mobile users and service providers, while extending best effort service to long-term users. This is achieved by prioritizing traffic to mobile users with shorter dwell times. While this reduces the load that carries over to 3G, the natural question is, how does it impact long dwell-time (laptop) users. Figure 7.4 illustrates the intuition. Without traffic prioritization, both the mobile and the

220 Carry-over to 3G Hotspot bandwidth Mobile Slack Capacity Laptop Laptop Unprioritized Time Mobile leaves with zero carry-over

Mobile Capacity Laptop Laptop BytesToGo Time

Figure 7.4: BTG prioritizes traffic of short-dwell mobiles. However, it compensates long-dwell laptops by exploiting slack periods. laptop get equal share during the mobile’s stay within the hotspot. Consequently, a fraction of the mobile’s download gets deferred to 3G. In contrast, with prioritization, most of the mobile’s need is met by the hotspot, reducing the 3G burden. Of course, the laptop’s download may get delayed. However, since the wired backend can be expected to have slack periods interspersed between user downloads, the laptop can be compensated soon after the mobile’s exit. The compensation may not always be adequate to make up for the deprioritization, and hence, we call this a best effort service. If necessary, BTG could even turn off prioritization when the slack periods are too small or infrequent. Further, BTG prioritizes non-interactive traffic only, allowing latency sensitive applications (VoIP, HTTP, Messaging) to run without interruption. These tradeoffs could be exported to the network operator through a BTG policy API. The operator can choose parameters based on revenue, 3G load, or other utility functions.

7.4.3 Extending from ToGo

BTG acts as a wrapper on top of ToGo, as shown in Figure 7.5. Predicted dwell times from ToGo form the input to a Traffic Shaping Module that regulates the

221 BTG Multiplayer Prioritized Policy Gaming Cloudlets Prefetch/ Targeted Traffic Shaping Caching Marketing

ESP Retraining Dwell Estimation

Mobility Classifier Client History

Client Sensor AP Services Feedback

Figure 7.5: ToGo synthesizes client sensor feedback to estimate dwell duration for associated clients. Applications such as BTG can leverage these predictions as necessary, for example, ensuring that multiplayer games will complete before one party leaves or providing prioritized access to cloudlets [163]. inflow of download TCP traffic from the Internet. First, interactive traffic (e.g., VoIP) is identified (by port number) and allowed to flow unimpeded at the highest priority. Non-interactive traffic is isolated by destination into per-client queues. Each queue is allotted a maximum drain rate as a function of (1) the hotspot bottleneck bandwidth, (2) the total number of clients per priority class, and (3) the amount of spare capacity. Naturally, low-priority TCP sessions get rate limited, allowing high priority traffic to take larger shares of the backhaul bandwidth.

Traffic Shaping

The Linux kernel provides support for sophisticated traffic classification and rate- limiting. Specifically, we use the Hierarchical Token Bucket (HTB) queuing disci- pline. HTB distributes bandwidth according to the specified ratios up to a maximum cumulative rate, matching the bottleneck bandwidth (after accounting for unshaped interactive flows). For example, a priority level 2 user is expected to stay 3 times longer than a priority 1 user. Thus, she receives bandwidth in a 3:1 ratio to priority one clients. If there were two priority 2 users, and one priority 1 user, each priority

1 3 2 user gets 5 th of the capacity, while the priority 1 user gets 5 th. To prevent undue 222 service degradation for low-priority clients, a minimum bandwidth is assured before HTB ratios may be applied. Thus, during periods of especially high WiFi contention, BTG may revert to equal bandwidth shares.

7.5 Implementation and Evaluation

Our evaluation follows in three parts: (1) the implementation of our BTG proto- type, integrating ToGo; (2) the accuracy of the ToGo dwell prediction engine in realistic deployment scenarios; and (3) the effectiveness of BTG leveraging the ToGo framework for offloading 3G traffic onto WiFi.

The main findings from our evaluation are:

• Relative to a baseline approach, called Naive, all ToGo schemes converge faster and more accurately to a user’s true dwell time (Fig. 7.6,7.8).

• Schemes that utilize multiple sensors perform better than the scheme that uses only WiFi RSSI/bitrate (Fig. 7.6,7.8).

• BTG live traffic shaping, with accurate ToGo dwell prediction, improves hotspot efficiency (Fig. 7.11).

• Trace-based analysis suggests that BTG can save one half of a 3G channel per AP (Fig. 7.12).

7.5.1 Prototype Implementation

At the Mobile Client (Google NexusOne phones), a lightweight Java background process periodically probes its sensors to create a summary report, representing the last few seconds of user behavior. The client forwards this report to the AP as a single datagram packet. The BTG Hotspot AP runs on an Ubuntu 9.10 laptop

223 (Linux kernel 2.6.31) with an Intel Core 2 Duo CPU, 3 GB RAM, and an Atheros chipset D-Link DWA-643 ExpressCard WLAN interface using the ath9k driver. The hostapd userspace daemon software provides a fully-compliant 802.11b/g/n AP. We built our AP prediction and prioritization module on top of userspace Click Modular Router [136]. Click provides a convenient mechanism to intercept client traffic to record RSSI and bitrate from upload traffic. Additionally, our Click module observes bidirectional traffic patterns to feed into our traffic prioritization and shaping engine. We use the libsvm C++ SVM library for client dwell prediction. Traffic shaping is conducted using the standard Linux Traffic Control subsystem. The Linux TC utility provides userspace hooks for live reconfiguration of the high-performance, in-kernel packet processing.

7.5.2 ToGo Performance: Dwell Prediction Accuracy

Comparative Schemes

We evaluate four variants of ToGo dwell prediction: NoFeedback; Basic; Ba- sic+Compass; and Basic+Compass+Light. In NoFeedback, client feedback summary reports are disabled. The WiFi AP infers user behavior strictly from time and (upload) RSSI/bitrate. NoFeedback requires no client-side changes, hence is compatible with all legacy devices. Basic generates client reports composed of accelerometer readings, GSM signal strengths, and (download) WiFi RSSI/bitrate. Basic+Compass adds electromagnetic compass, found in newer smartphones. It is also representative of the most feature-rich devices when placed in a pocket or purse. Basic+Compass+Light adds a light sensor to account for the case where the phone is exposed to the ambience. Finally, we also include a trivial-but-reasonable scheme, called Naive. This scheme predicts dwell time classes only based on the duration

that the device has already stayed in the hotspot. Specifically, let rt1,t2q correspond

224 to class i and rt2,t3q correspond to class i ` 1. Naive predicts class i until pt1 ` t2q{2 and i ` 1 until pt2 ` t3q{2, and so on. We use a metric, Mean Priority Misprediction, defined as follows. Assume

u u that user u’s true dwell time, δtrue, maps to a priority level Ptrue. ToGo’s goal is to converge to this priority level as soon as possible, and maintain it until the user leaves.

u u At a given time t, ToGo predicts dwell time as δpredictptq, which maps to Ppredictptq.

u u u The instantaneous prediction error at t can be expressed as D ptq“ Ptrue ´Ppredictptq. We evaluate the prediction error across all N users at time t, as:

N |Duptq| MeanP riorityMispredictionptq“ u“1 (7.1) ř N

All our accuracy results are obtained using cross-validation. The ToGo AP records micro-mobility signatures for each client, as a function of time. The AP predicts the dwell time for a particular user’s signature after training on all other users.

Prediction with Real, Uncontrolled Users

We tested ToGo prediction accuracy at a campus coffee shop (hereafter referred to as the Cafe) with 15 real customers. As they entered the Cafe, each customer is asked to carry a phone running ToGo mobile client. We gave no instructions to the customers regarding how to carry or handle the phone. A ToGo AP is centrally deployed to collect data from the phones. The actual behaviors were as follows: two users of Class 1 (bought something and walked out in less than 1 min), seven users of Class 2 (waited in a queue and walked out in 1-3 mins), two users of Class 3 (waited for grilled food and walked out in 3-6 mins) and four users of Class 4 (bought food and ate at one of the tables). Figure 7.6 shows mean priority misprediction for all users across their stay. All ToGo schemes converge to the correct priority class (0

225 Pred. Accuracy, Cafe w/ Real Users 2 Naive NoFeedback (RSSI) 1.5 Basic (Accel+3G) Basic+Compass Basic+Compass+Light

1

0.5 Mean Dwell Pred. Error Dwell Mean 0 0 100 200 300 400 500 600 Dwell Duration (s)

Figure 7.6: Cross-validation on 15 real-user traces at the Cafe. Despite only 14 SVM training points, ToGo correctly classified users within 2.5 minutes. Additional sensors reduce prediction error during convergence.

mean error) in « 2.5 minutes. Of course, this is the average across all classes; short dwell users (such as class 2) are assigned high priority within the first 30 seconds of arrival at the Cafe.

Capturing User Behavior

Running uncontrolled experiments with real users is difficult at unfamiliar locations. To test ToGo at scale, we adopted an alternative methodology. We conducted a visual survey of people’s movements at 3 different hotspots: a Cafe, a campus Library and a McDonald’s frequented by students (McD). We tested our system at each of these locations, but for simplicity, we focus our discussion on the McD hotspot. The floor plan of McD is shown in Figure 7.7. We surveyed McD from 11am to 4pm, the busiest 5 hours on a weekday. We randomly picked visitors and drew their movement traces on photocopies of the floor plan, along with timestamps for pauses. After, we used the recorded traces to mimic the observed real behaviors. Figure 7.7 illustrates one behavior along a representative path. Activities, such as

226 Booths Restaurant

Stairs Up Tables

(iv) McDonald's Counter (v) Trash

(ii)

Booths (i)

Condiments AP (iii) Couches

Stairs Down

User Mobility Path

Lobby (vi) Figure 7.7: Diagram shows user behavior along a representative path. User (i) walks up to McDonald’s to examine wall-mounted menu and wait in queue line (10-60 seconds); (ii) places order, waits for food (1-2 minutes); (iii) takes condiments (2-15 seconds); (iv) sits and eats food (5-15 minutes); (v) discards trash (1-10 seconds); and (vi) exits to lobby. taking condiments, indicate user’s intentions (e.g, to sit at a table) and thus help discriminate different classes of users. We observed that the McD customers often fell into 5 broad classes of behavior: (1) walk past without stopping (ă 2 min); (2) order and take out food (2-4mins); (3) pickup food and sit at one of the side tables (10-15 mins); (4) pickup food and sit at a high table (10-30min); and (5) sit in one of the booths to work (ą 30mins).

Emulating User Behavior

Having manually characterized the behavioral patterns at the McD, Library, and Cafe hotspots, we mimic random selections from the recorded behaviors holding a mobile device running ToGo. While reenacting, the dwell times of observed cus- tomers are proportionally shortened to reduce experimenter burden. We believe our reenactments are reasonably reflective of the original customers.

227 Dwell Pred. Accuracy, McDonald’s Dwell Pred. Accuracy, Library 2.5 1.5 Naive Naive NoFeedback (RSSI) NoFeedback (RSSI) 2 Basic (Accel+3G) Basic (Accel+3G) Basic+Compass Basic+Compass 1 1.5 Basic+Compass+Light Basic+Compass+Light

1 0.5

0.5 Mean Dwell Pred. Error Dwell Mean Pred. Error Dwell Mean 0 0 0 100 200 300 400 500 0 50 100 150 200 Dwell Duration (s) Dwell Duration (s)

Dwell Pred. Accuracy, Cafe 2.5 Naive NoFeedback (RSSI) 2 Basic (Accel+3G) Basic+Compass 1.5 Basic+Compass+Light

1

0.5 Mean Dwell Pred. Error Dwell Mean 0 0 100 200 300 400 500 Dwell Duration (s)

Figure 7.8: Mean priority misprediction at 3 hotspots: (a) McDonalds; (b) Li- brary; (c) Cafe. All ToGo variants perform better than Naive. NoFeedback performs reasonably well when there is enough RSSI diversity as in a large hotspot such as library.

Prediction Accuracy and Sensor Contribution

We emulated 60, 72, and 48 user behaviors for the McD, Library, and Cafe hotspots, respectively. Figure 7.8 presents mean priority misprediction over the client dwell duration for each location. Again, all ToGo schemes substantially outperform the Naive (time-only) scheme. Variants with additional sensors and client feedback pre- dict the correct priority class sooner than the NoFeedback approach (RSSI/bitrate only). In the Cafe test, convergence time for the NoFeedback scheme is especially slow relative to the schemes with client sensors.

228 Pred. Accuracy, McD, Class 3 Pred. Accuracy, McD, Class 4 2 3 Naive Naive NoFeedback (RSSI) NoFeedback (RSSI) 1.5 Basic (Accel+3G) Basic (Accel+3G) Basic+Compass 2 Basic+Compass Basic+Compass+Light Basic+Compass+Light 1

1 0.5

Mean Prediction Error Prediction Mean 0 Error Prediction Mean 0 50 100 150 200 0 100 200 300 400 Dwell Duration (s) Dwell Duration (s)

Pred. Accuracy, McD, Class 5 4 Naive NoFeedback (RSSI) 3 Basic (Accel+3G) Basic+Compass Basic+Compass+Light 2

1 Mean Prediction Error Prediction Mean 0 100 200 300 400 500 600 700 Dwell Duration (s)

Figure 7.9: Prediction accuracy by priority class. Dwell duration (X-axis) is differ- ent for each class (increasing by class number). Naive requires substantially longer before convergence to the correct classification.

Prediction by Priority Class

Figure 7.9 shows the average prediction error over time for three different priority classes. The purpose of these plots is to understand the effect of priority class on prediction. The predictions for all classes have non-negligible errors during the first 100 seconds. This is because of the similarity in the paths of different classes, as users move towards the counter in the McD trace. All ToGo schemes converge to the right priority at around 100 seconds independent of class; this is when the users leave the counter and a diversity in their paths emerges.

229 7.5.3 BytesToGo Performance: Offloading 3G to WiFi

We evaluate the extent to which BTG can reduce 3G load. A live experiment of our complete implementation highlights the effectiveness of our design and implementa- tion. Then, a trace-based evaluation characterizes 3G savings at scale.

Traffic and AP

We assume that non-interactive traffic exerts the majority of the strain on 3G net- works [30, 76]. Clients will download/upload full length movies, videos, picture- albums, eBooks, etc. Such types of traffic needs to be (and can be) offloaded to WiFi. We evaluate BTG with TCP download traffic in a single-AP system. We be- lieve our approach is applicable to upload traffic as well as to multi-AP environments (more in Section 7.6). We also assume that the BTG AP owner is willing to provide selective treatment to the mobile clients.

User Demand

We assume that a client’s “appetite” for data download is limited by the data con- sumption time, i.e., if a video takes 1 minute to download and 5 minutes to watch, the client does not initiate the next download until the end of 5 minutes. We believe this assumption models common data usage. Of course, the user might browse the web or send instant messages while buffering the video. To avoid delaying these applications, BTG does not prioritize interactive traffic.

Download Initiation

The device dwell time (used interchangeably with user dwell time) is defined as the total time the mobile device remains associated to the hotspot. However, if the device starts downloading sometime later, the dwell time should perhaps be computed differently. To simplify our setting, we assume that the device starts

230 downloading immediately on hotspot association. This does not affect the core dwell time prediction algorithm, and BTG is applicable without this assumption. We revisit this in Section 7.6.

3G Time Saved

The 3G savings due to BTG arise from accelerating the download on WiFi before going to 3G. We sum the savings of individual users with prioritization, to get total

u u savings. Let Mprio and M be the total downloaded WiFi data (with and without prioritization respectively) for user u. Then the total 3GSavings are:

N u u 3GSavings “ pMprio ´ M q (7.2) uÿ“1

Live Experiment

Using multiple experimenters, we simultaneously emulate previously-recorded user behaviors. Based on live dwell predictions, BTG (Basic+Compass+Light variant) performs dynamic traffic shaping. This experiment is performed in the Cafe after training the AP with the Cafe trace data. The arrival and departure time of mobiles (drawn from recorded patterns) are as follows (in seconds): (0, 660); (15, 80); (60, 120); (200, 560); (240, 310); and (360, 580). Arrival and departure times are illus- trated in Figure 7.10. Each arriving device begins a 100MB HD 720P video download (typical of YouTube HD videos). The video viewing length is about 6 minutes and 40 seconds. With just one device operating, it took about 75s to download the video (15 Mbps backhaul capacity). An experimenter emulating user behavior waits for the viewing duration of the video and, upon completion, starts another download of the same size (as if the user watches a series of videos, selecting the next, once the first completes). The complete experiment was repeated 3 times with and without prioritization. Figure 7.11 presents the results. BTG consistently achieves higher

231 data transfer for shorter dwell clients, saving an average of about 55MB of 3G data per run.

0s 200s 400s 600s

Figure 7.10: Relative overlap of emulated user behaviors, live experiments.

Full Live System, 400s 720p Video 1

0.8

0.6

0.4

Empirical CDF Empirical 0.2 No Traffic Shaping BytesToGo 0 0 20 40 60 80 100 Per−user Download (Mbps)

Figure 7.11: Performance of BTG with live traffic shaping in Cafe. Traffic shaping benefits clients with shorter dwell times.

Trace Based Evaluation

To test BTG 3G savings at scale, we conduct a trace-based evaluation. We consider both HD and non-HD video downloads. HD video parameters are the same as in the live experiment. For non-HD, we assume a 5 minute video at a 320kbps video encoding, with a size about 12MB, typical of non-HD YouTube videos [43, 67]. User arrival times are modeled based on real device arrival times obtained from the tcpdump (Fig. 7.1). The trace recorded a total of 340 devices, with 40 devices arriving per hour on average. Each user is assigned a mobility pattern (and corresponding dwell time) by randomly picking a trace from the McD hotspot. The total data

232 250 Nofeedback Basic 200 Basic+Comp Basic+Comp+Light HindSight 150

100

50

0 Standard HD720p Data reduction on 3G per hour (MBytes) WiFi Capacity

Figure 7.12: 3G data saved per hour by one AP. BTG prioritization improves WiFi utilization, providing substantial 3G network savings. Gains increase with larger HD files. Note that in some cases, RSSI based NoFeedback variant suffices to differentiate short dwelling users.

downloaded by each class of devices for one hour is calculated based on the priorities (by extension bandwidth) assigned by the AP and the total WiFi capacity. We conduct this experiment for each of our hotspot trace locations. However, in the interest of space (and similar results), we present 3G savings for only McD in Figure 7.12. Each data point reflects the mean of 100 trials. A hypothetical Hindsight scheme is shown for comparison. Hindsight “predicts” at the end of the trace with full knowledge of all client dwell times. For non-HD video, the BTG schemes almost save as much 3G data as Hindsight, approximately 100 MB/hour. At 3G rates, this equates to about 30 minutes of 3G channel time saved per hour, exclusively by prioritization. With HD video, the benefits of accurate prediction become clearer, boosting the savings to 45 minutes for the best BTG variant.

7.6 Discussion

This section discusses a number of issues and open questions with ToGo and BTG.

233 7.6.1 Considerations for all ToGo Systems

Multi-AP Hotspots

Thus far, we have assumed that ToGo would be deployed in a small hotspot location (e.g., a cafe) with only a single AP. In practice, a hotspot may have multiple APs extending over a larger coverage area. In these circumstances, a change of AP asso- ciation should not be considered as a departure from the hotspot. ToGo naturally extends to these environments. A dedicated server or cloud application may serve as a network controller, in the style of the existing enterprise WLAN architecture. APs forward client feedback reports to the controller, paired with time, RSSI, and bitrate annotations. The controller performs training and prediction tasks for all APs in the hotspot. Aggregation of client data from multiple APs can also improve training quality. RSSI values for a client from non-associated APs can serve as ad- ditional features for prediction. With this RSSI feedback, along with knowledge of intra-hotspot AP-to-AP handoff patterns, the ToGo controller may provide a higher prediction accuracy than in the one-AP case.

Device Usage

For simplicity, our experiments have assumed that a user walking into a hotspot will have her device on and running the ToGo client feedback service. Real behavior will exhibit greater diversity. Users may, for example, walk into a cafe, order food, and sit down all before turning on a ToGo-enabled device. In this case, the AP will have reduced information available for dwell prediction, possibly leading to increased error. However, we expect that these behavioral tendencies may also be learned over time. When and where a user activates her device may itself be a strong dwell predictor.

234 Energy Overheads

We have not quantified the additional energy drain required for client feedback. Accelerometer, compass, and light sensor usage, if not already in-use for some other application, incur an energy cost. For BTG prioritized clients, these costs should be mitigated by energy savings on WiFi compared to 3G. A detailed treatment of the energy-throughput interplay, in the context of ToGo applications such as BTG, is left for future work.

7.6.2 Considerations particular to BytesToGo

Selecting the Right Policy

As in any form of prioritization, appropriate policy selection can be complex. In BTG, the AP owner must weigh the value of 3G bandwidth savings against quality-of- service for hotspot users. Without real-world preferences, BTG cannot optimize this tradeoff. However, traffic shaping parameters provide simple hooks to do so. Users may be assigned a minimum reservation bandwidth (as a function of the number of active users), and only the excess capacity may be prioritized by dwell time.

What if a greedy user fakes sensor readings to get higher priority from BTG AP? Cur- rently we do not have a mechanism to prevent such a selfish behavior. Note that many MAC protocols are designed assuming cooperative behavior and any non-compliance monitoring is external to these protocols. Nevertheless, we plan to explore ways to monitor the integrity of sensor readings and also evolve an approach to disincentivize selfish users.

235 7.7 Related Work

7.7.1 WiFi and Cellular

In CellShare [167] a rural WiFi network benefits from cellular network augmenta- tion. The system allows the use of mobile phones to provide temporary Internet connectivity when parts of the network are disconnected. A similar architecture is used in CoolTether [168] to access the 3G network from the laptop using phone WiFi tethering. The scheme in [171] helps in recovering lost 3G multicast data by relay- ing on WiFi among neighboring devices. In MobTorrent [39], the cellular network is used as a control channel to predict mobility information and prefetch content. Our work makes use of WiFi to optimize the traffic offload from 3G. Wiffler exploits WiFi to reduce the load on 3G in vehicular networks [24]. Complementary to Wiffler, we are interested in offloading the 3G traffic onto WiFi during pedestrian hotspot connectivity.

7.7.2 Mobility Prediction

Macro mobility prediction for cellular networks is a well-researched area [96, 106, 14, 146]. BreadCrumbs [143] is a recent macro-mobility solution that harnesses habitual nature of human mobility. Based on a per-user history, Breadcrumbs makes use of a second-order Markov model to provide connectivity forecasts on which AP associations will be most useful. Our work complements BreadCrumbs, with a focus on micro-mobility and user behavior within the range of a single AP, and attempts to predict dwell time (with an approach similar to nth-order Markov model). Further, we aim to capture general behavior trends across clients, allowing the system to make informed decisions the first time a user associates to an AP.

236 7.7.3 Activity Recognition

Activity recognition is an active research area [26, 48, 129, 128, 27, 188] due to the pervasiveness of sensor assisted phones. This work enables user dwell time prediction as a novel instance of activity recognition.

7.8 Conclusion

This paper proposes ToGo and BytesToGo (BTG) to predict user dwell time and prioritize short dwellers, thereby offloading some of the traffic from 3G networks to WiFi hotspots. We address the challenge of predicting dwell time with the aid of client sensor data and a machine learning algorithm at the AP. Evaluation over traces collected from 3 different hotspots and live experimentation have shown that BTG saves 3G time without hotspot-specific configuration. As part of future work, we plan to explore the other uses of dwell time prediction. More broadly, this work can be considered an instance of micro-behavior-aware networking, which we believe has vast scope for further research.

237 8 Encounter-Based Trust for Mobile Social Services

Conventional mobile social services such as Loopt and Google Latitude rely on two classes of trusted relationships: participants trust a centralized server to manage their location information and trust between users is based on existing social re- lationships. Unfortunately, these assumptions are not secure or general enough for many mobile social scenarios: centralized servers cannot always be relied upon to pre- serve data confidentiality, and users may want to use mobile social services to estab- lish new relationships. To address these shortcomings, this paper describes SMILE, a privacy-preserving “missed-connections” service in which the service provider is untrusted and users are not assumed to have pre-established social relationships with each other. At a high-level, SMILE uses short-range wireless communication and standard cryptographic primitives to mimic the behavior of users in existing missed-connections services such as Craigslist: trust is founded solely on anonymous users’ ability to prove to each other that they shared an encounter in the past. We have evaluated SMILE using protocol analysis, an informal study of Craigslist usage, and experiments with a prototype implementation and found it to be both privacy-preserving and feasible.

238 8.1 Introduction

Programmable consumer devices such as mobile phones have placed computation within arm’s reach at all times and in all places. Mobile social services take advantage of the nearly constant physical proximity of devices to their owners to enable a wide range of new social interactions. In a conventional mobile social service, devices send intermittent location updates to a service provider, which uses those locations to coordinate interactions among participants. For example, Google Latitude [69] and Loopt [111] are popular services that allow users to share their location information with friends. Within existing mobile social services, trust is founded on two classes of rela- tionships: one with the service provider and another with peers. Service providers are treated as benevolent guardians of their location data. Users control which par- ticipants can track their location, but their location privacy is not protected from the service providers themselves. Trust between users is almost always based on pre-established social relationships. Social groups, such as work colleagues, family members, and friends, typically define a subset of users that may access a partici- pant’s location information. Unfortunately, neither class of trust relationship provides a secure or general foun- dation for mobile social services. First, lessons from existing online social networks (OSNs) demonstrate the many ways that data confidentiality can be compromised by trusted service providers: users’ sensitive data can be inadvertently leaked [179], can fall under the control of hackers [89], and can be abused by service adminis- trators [126]. The potential leakage of users’ long-term location histories is a seri- ous threat, and would be more damaging than leaks of the media and messaging state currently managed by OSNs. In addition, restricting location sharing to pre- established social relations makes a large class of compelling mobile social services

239 impossible. For example, services such as Social Serendipity [60], which notifies users when like-minded strangers are nearby, are impossible if all trust relationships must be pre-established. As a result, this paper describes SMILE, a mobile social service in which trust is established solely on the basis of shared encounters; the service provider is not trusted to access users’ location information and we assume no pre-established trust relationships among users. At the heart of the service is the notion of an encounter, which is defined as a short period of co-location between people. Our service is modeled after the popular “missed-connections” services found in newspapers and websites like Craigslist. The key features of a missed-connections service are: (1) strangers who were at the same place and time should be able to contact each other at a later time; (2) once connected, those strangers should be able to prove to each other that they actually encountered one another. We use three complementary techniques to provide these features without exposing users’ location information to either the service provider or adversaries claiming to have been physically present at a particular place and time:

1. Co-located participants perform periodic passive key exchange with each other using short-range wireless broadcasts.

2. Participants use key hashes to establish a rendezvous point at a centralized server without exposing the encounter location to the service provider.

3. Participants limit the service provider’s ability to infer which pairs of users were involved in an encounter by carefully inducing key-hash collisions at the server and relying on clients to resolve ambiguities.

240 The high-level insight behind these techniques is derived from observations of existing online missed-connections services. In these services, a poster normally asks anonymous respondents to confirm small details from the encounter. For example, to ensure that she is communicating with her waiter from the previous night, a user might ask a respondent to tell her what she ordered. SMILE’s passive key exchange protocol functions similarly, by creating shared knowledge about an encounter that only participants could have recorded. We have evaluated SMILE using protocol analysis, by characterizing behavior within the missed-connections feature of Craigslist, and through experiments with a prototype implementation. Based on this analysis, we have found that SMILE provides users with both location and encounter privacy from adversarial service providers and peers, and that our passive key-exchange protocol is feasible using a widely-deployed, short-range wireless technology, such as Bluetooth. The rest of the paper is organized as follows: in Section 8.2, we outline our basic assumptions and threat model; in Section 8.3, we present a server-centric missed-connections system utilizing collisions in a centralized hash table to provide k- anonymity; in Section 8.4, we consider an alternative distributed missed-connections scheme relying on an anonymized remailing or onion-routing network; in Section 8.5, we evaluate the feasibility of our scheme; in Section 8.6, we present related work; we conclude in Section 8.7.

8.2 Trust and Threat Model

SMILE allows strangers who shared an encounter in the past to communicate at a later point in time. An encounter is defined as two people being in close physical proximity to each other for a period of time. The challenge addressed in this paper is providing a missed-connections service with strong location-privacy and encounter- privacy guarantees. A user’s location privacy is violated when either the service

241 provider or an unauthorized user can infer with high probability that the user was in a particular place at a particular time. Similarly, a user’s encounter privacy is violated when the service provider or an unauthorized user can infer with high probability that two users were in the same place at the same time. Many privacy threats exist independently of SMILE, and are thus beyond the scope of this paper. For example, we make no attempt to conceal devices’ locations from cellular-network operators, access-point administrators, or any other snoop- ing radios. Many attacks can be launched from these vantage points due to the pervasive use of static MAC addresses and identifiable traffic patterns in wireless networks. Work on disposable addresses [72], prolonged silent periods [88], and privacy-preserving link-layer protocols [70] offer solutions and could be plugged-in, when available. Unlike most mobile social services, we do not assume that trust is derived from pre-established social relationships. Instead, trust in SMILE is based only on shared encounters. Assuming there is mutual interest in establishing communication, two users trust each other only if they can convince each other that they were in the same place at the same time. In the absence of mutual interest or proof of an encounter, users remain anonymous to each other.

8.2.1 Adversarial Capabilities

We utilize a central server to aid in post-encouter matching. This infrastructure, and all other third-parties, are considered untrusted. Further, we assume that all adversaries are endowed with at least the following set of capabilities. We assume that servers have access to substantial personal information about all users, including each user’s full name, billing address, IP-localized home address, and credit card information. We further assume that an attacker can arbitrarily read or replace user data and network traffic. This allows server administrators to perform timing

242 analysis on user data, forge user data, interpose on communication between users, masquerade as any user, and replay user messages.

8.2.2 Adversarial Limitations

On the other hand, we also assume that malicious participants and servers are lim- ited in the following ways. First, within a “home” geographic region (i.e., a metro area), we assume that all users know approximately how many users participate in the system and how often users register and respond to missed-connections requests. Users could obtain this information out-of-band via a third-party monitoring service such as Alexa [15]. We also assume that participants do not share information about an encounter with users who were not present. Collusion of this form violates our trust model and can allow a user who was not part of an encounter to generate a false proof. Finally, we assume limited collusion among malicious users and service providers. Successful collusion attacks require subversion of a large portion of legit- imate system users or an adversary who was physically proximate to an encounter.

8.3 SMILE System Design

In this section, we present the design of SMILE, Secure MIssed connections through Logged Encounters. SMILE is a secure, centralized missed-connections service. The basic structure of SMILE’s messaging protocol is as follows: (1) mobile users passively exchange cryptographic keys with nearby peers; (2) users periodically upload batches of key hashes to a central, coordinating server; (3) a user sends a message to the server encrypted with one such key and labels it with the corresponding key hash; (4) the server forwards the encrypted message to all users that have uploaded the same key hash; (5) only encounter participants are able to decrypt the message. SMILE offers protection against malicious agents attempting to determine or disclose a user’s

243 Client State 0x2A Server Messaging State

x 7:29pm 35.966853° -78.9492459° 0x2A 0x2A Ex("Red dress?" || t ) ... 0x2A 0x2A Ex("Blue shirt?" || t + 1) 0x2A 0x2A E ("Hi again!" || t + 2) H(y) = 0x2A x Key y Shared Server H(x) = 0x2A

E ("Red dress?" E ("Blue shirt?" E ("Hi again!" Key x Shared x ??? x ??? x ??? || t ); H(x) || t + 1); H(x) || t + 2); H(x)

Figure 8.1: An illustrated sequence of operations. Let H denote a cryptographic hash function and Expmq denote the encryption of message m with key x. Encounter keys x and y hash to the same value, leading the server to relay Expmq to participants in both encounters. However, only participants with key x can recover message m. A timestamp t nonce in the reply prevents replay attacks. location history, encounter history, or private messages. Figure 8.1 presents a high- level depiction of the SMILE protocol. The bulletin-board approach of traditional posting services and newsprint per- sonals has two primary drawbacks. First, these services require participants in an encounter to actively search for their match. This scheme is inconvenient and in- efficient: one person must post a listing and hope that the other will find it after extensive manual browsing. Second, because anyone can respond to a posting, even if they were not present for the encounter, existing services provide very weak au- thentication guarantees. Figure 8.2 shows the “ideal” approach, in which a user’s missed-connections mes- sages are routed directly to their intended recipient. Would-be recipients would not be required to search ads or websites to receive a message. Our aim is closely approximate this ideal service without compromising participants’ privacy.

244 Bulletin Board Ideal Direct Messages

"Missed" "Missed" Person A Event Person B Person A Event Person B

Post Contact Description

Browse Reply Listings Confirm

Reply Confirm

Figure 8.2: In online missed-connections posting services (such as Craigslist), post- ing subjects are forced to manually browse up to hundreds of unrelated postings. By directly routing messages to encounter participants, SMILE is more efficient and less error-prone.

8.3.1 Encounter Detection

Users participate in SMILE by carrying a smartphone, laptop, or other mobile device running a lightweight sensing application. Through short-range wireless communi- cation, participants sense the presence of others in their proximity. An incident of mutual detection is considered an encounter. During each encounter, co-located peers use a wireless link to establish a random symmetric key. Hereafter, we will denote this shared state as the encounter key. The encounter key may be randomly generated by either party. Figure 8.3 depicts encounter-key distribution.

Local Encounter State

A user’s device automatically logs encounter keys, along with when and where they were received, in a database. Localization need not be highly precise, and can be determined using GPS, WiFi access points, or GSM towers. Locations only help the user identify or recall past encounters. The privacy risk of recording this information is minimal, as it will never be provided to the server or any peer. The local database may be encrypted against a password for protection in case of device theft.

245 Encounter Time 1

C C1 C2 3

C C3 Key 3 2 Key 1 Encounter

C 4 C 4 C1 C5 C Key 4 5 Key 2 Time 2

Figure 8.3: Wireless encounter-key broadcasts provide co-located users with shared state that can later be used to prove participation in an encounter.

Encounter Provenance

If other useful state is also available, it may be stored as a supplement to time and location. The aggregation of this state amounts to a record of human-interpretable encounter provenance, metadata describing the origin of the encounter record. This idea is similar in spirit to provenance systems proposed for data storage [137] and web browsers [124]. For example, a smartphone could record what websites the user was browsing, active chat sessions, the most recently dialed telephone number, the last photograph taken, or the song playing at the time. A sophisticated device could be even more proactive in recording provenance state. For example, if equipped with an accelerometer, the device could perform activity recognition (e.g., biking, jogging, walking, or sitting) [50]. The device could also automatically take a picture or series of pictures whenever an encounter occurs. Face detection software, which is already present on many consumer point-and-shoot cameras, could then be used to filter photographs that do not identify the surrounding people during an encounter. Finally, short audio recordings, taken at the time of the encounter [8], could provide an audio context for an encounter. We leave a more in-depth discussion of these techniques for future work. For the rest of the paper, we assume that local records include only location and time information.

246 Server Synchronization.

When convenient (e.g., during nightly phone charging), a user synchronizes encounter and provenance state from the carried mobile device with a desktop client. Period- ically, on a timescale of hours or longer, the client uploads a preimage-resistant cryptographic hash, Hpxq, of each new encounter key, x, in randomized order. Ran- domized order and batched uploads combat server timing attacks on hash uploads. The server might otherwise deduce that a pair of users were involved in an encounter by observing near-simultaneous uploads of the same key hash.

8.3.2 Missed-Connection Reestablishment

To act upon a missed connection, a user queries the client-side database for the estimated time or place when the encounter occurred (optionally using whatever other provenance information the device and client support). If the client database contains a match, the user may compose a message to be sent to all peers present for the encounter. The client concatenates the message m with a timestamp t, encrypts the result using the encounter key x, and uploads the encrypted token to the server with a hash of the encounter key as tHpxq; Expm||tqu. The server compares the message’s key-hash Hpxq with all previously uploaded encounter-key hashes. For each match found, the server places the key hash Hpxq

and the encrypted message token Expm||tq in a mailbox for the corresponding user. Clients periodically download new messages and compare the hash associated with each message to their databases of hashes recorded during past encounters. Assuming protocol-compliant client and server behavior, clients will always find at least one match. For each matching key, clients attempt to decrypt the corresponding message from the server. Messages that cannot be decrypted (i.e., because the sender used an unknown encounter key) are discarded.

247 45 40 35 30 25 20 15 10 5 % Classified Encounters % Classified 0 Personal Things Memorable Time or Other Privately Description Present Event Place Observable Shared Type of Confirmation Requested

Figure 8.4: Classification of identity confirmation checks requested, among Craigslist posts requesting some check. Most checks rely on features observable to (and thus forgeable by) third parties, such as a personal description.

If a message is successfully decrypted, the client notifies the user, and provides the decrypted message along with any provenance state recorded at the time of the encounter (e.g., time, place, and photographs). The user may choose to respond by entering a reply message m1, which the client encrypts and uploads to the server

1 as tHpxq; Expm ||t ` 1qu. Prior to encryption, the client increments the original message’s nonce and concatenates it to the reply. The nonce prevents replays of the encrypted message.

Comparison to Common Practice

In the absence of collusion, SMILE guarantees that only encounter participants can decrypt and respond to messages. This significantly reduces the number of potential respondents compared to conventional services in which anyone can respond to a posting. However, a “reconnection” could still be with someone other than the intended recipient if the physical space of the encounter (constrained by wireless transmission range) included other individuals. Once each client has verified the other’s proof of co-location, SMILE falls back on the informal checks used in existing

248 missed-connections services. For example, it is common for users to ask for initials, a shirt color, or the subway stop at which an individual departed. Unlike recorded cryptographic state, answers to these questions can be guessed or even forgotten. Figure 8.4 categorizes posts on Craigslist requesting some form of verification by confirmation type1. Unfortunately, less than 20% of checks could be categorized as “privately shared,” while the vast majority of verification information would have been observable to a co-located third party. As a result, though SMILE is more effi- cient and more secure than existing missed-connections services, it is still vulnerable to fraudulent responses by co-located snoopers.

8.3.3 K-anonymity Preservation

SMILE uses key hashes to deliver messages without compromising users’ location privacy, but this does not prevent an adversarial server from inferring which pair of users was involved in an encounter. In this subsection, we discuss the k-anonymity techniques used by SMILE to protect users’ encounter privacy and present an an- alytical model of their properties. The key insight behind these techniques is that by tuning the number of hash bits revealed to the server, l, clients can induce hash collisions to protect their encounter privacy. Furthermore, by controlling the fre- quency of these hash collisions, users can independently tune their personal level of k-anonymity. Table 8.1 summarizes our model parameters.

Hash Prefixes

We assume that the central server can associate encrypted messages with a unique source client. Thus, to preserve encounter anonymity, clients obfuscate the message recipient rather than the source. Assume clients Ca and Cb participate in an en- counter. Let x be an encounter key and Hpxq be the corresponding cryptographic

1 The details of our Craigslist classification methodology is described in Section 8.5.2.

249 Table 8.1: Analytical Model Parameters

User Controllable Parameters k Number of users against which encounter anonymity is preserved l Number of prefix bits a client reveals from an encounter hash d Max random delay, min period to which the server can estimate message timing f Fixed number of messages a client sends per conversation System Properties n Total number of system users r Average per-user sending rate to new peers (i.e., rate of new conversations) p Proportion of encounters in the system as plausible for a client as encoun- ters in which the client actually participated c Proportion of clients users in collusion with the server

hash of the key computed by each device. When uploading encounter hashes to the

server, Ca and Cb may not provide all of Hpxq. Instead, they reveal only P pHpxq,lq,

the l-bit prefix of Hpxq. Ca and Cb independently choose values for l corresponding to their personally-desired level of k-anonymity and estimated system properties. The

ith message mi in a conversation is uploaded as tP pHpxq,lq; Expmi||t ` iqu, where x is the encounter key and t is the upload time of m0.

Matching and Forwarding

The server delivers messages by matching message-hash prefixes to encounter-hash prefixes. Let ls denote the prefix length of a hash Hs used by the sender of a message. Let lr denote the prefix length of a hash Hr used by a potential recipient of the message. The server considers P pHs,lsq and P pHr,lrq a match if and only if one is a valid prefix of the other. That is, P pHs, minpls,lrqq “ P pHr, minpls,lrqq for all matching Hs and Hr. Note that for l1 ě l2, P pH, minpl1,l2qq can be computed as

P pP pH,l1q,l2q.

250 Prefix-Length Selection

As prefix length decreases, the number of messages a client receives will increase. For smaller values of l, a client will receive more messages that do not correspond to any held encounter key. This provides greater anonymity for messages actually intended for that client, at the cost of higher overhead. To limit message flooding, the server

may impose a minimum prefix length lmin. Clients may similarly filter based on prefix length if the number of received messages becomes burdensome. Users who require l ă lmin should abandon SMILE, since the server cannot provide their desired level of anonymity. Clients must choose their prefix length l carefully. Selection of l should ensure that at least k users send messages using the same key-hash prefix. More precisely, these messages must be sent during a bounded time window d, due to the potential for timing attacks. If l is too large, the server will be able to deduce the identities of two communicating clients from a bidirectional message exchange sharing the same long prefix. In addition, if l is large, the uploaded encounter-key hashes alone may pose a privacy risk, since fewer than k users may even use the same key hash. Because clients will naturally send messages to only a small proportion of peers they encounter, the selection of l to meet messaging requirements necessitates that far more than k users select the same prefix. To prevent traffic-analysis attacks, the selection of l supersedes the selection of k. There is no privacy risk from a small l, but if l is too small, the client may receive an excessive number of non-decryptable messages. To achieve k-anonymity, the client selects the maximum l that is expected to achieve the desired value of k. Since l is client-tunable, so is k. This is appealing because clients can choose their own level of overhead, depending on their personal level of paranoia.

251 Anonymity Tuning

The link between l and k is a function of the number n of users in the system, the average rate r at which users send messages to a new peer (i.e., start a conversation or reply for the first time), and the proportion p of encounters in which the receiving client could have participated as plausibly as in its actual encounter. In the absence of any external information linking the receiving client to a subset of encounters, p “ 1 for the receiver. However, because we assume that the coordinating server has some coarse-grained location information for all clients, p is likely to be less than one in practice. To strengthen our adversarial model, we assume that the server can predict the precise time t at which a reply message will be sent. If such an attack is thwarted, so are all probabilistic attacks based on reply message timing. Before a reply message is sent, the client introduces a random delay uniformly distributed between 0 and d. During the interval rt,t ` ds, an average of n ¨ r ¨ d{2l messages will be sent using the same key hash, n ¨ p ¨ r ¨ d{2l of which will be indistinguishable from the reply. Assuming uniformly distributed messages, k “ n ¨ p ¨ r ¨ d{2l. Thus, a client should

select l “ tlog2pn ¨ p ¨ r ¨ d{kqu to achieve the desired level of k-anonymity. If l ă 0, the desired k is unattainable under the system conditions and the client’s choice of d.

Parameter Estimation

We expect that a rough estimate of the number of users in the system n and the average new-conversation messaging rate r will be well-known properties of a widely- deployed service. An adversarial server has an incentive to exaggerate these values, convincing a naive client to choose l to be too large for the desired k. Thus, users should rely on external estimates if possible.

252 An adversarial server may also try to deduce which users are likely to have en- countered each other. When effective, this reduces p, the proportion of encounters in the system as plausible as a client’s true encounters. We assume that either the server explicitly knows a home physical address for its clients (e.g., as part of billing information) or can deduce a localized geographic region from a client’s IP address during message and key hash uploads. There is a high probability that encounters occur in the geographic region surrounding a client’s home. An adversarial server can thus correlate matching encounter-key hashes coming from the same geographic location. Thus, p is limited to the proportion of recorded encounters in the plausible encounter region defined by a client’s address. The size of this region is dependent on the mobility patterns of both the client and the peers the client encounters. If a client can accurately approximate p, l can be correspondingly deflated to preserve the desired k, albeit at increased overhead. Unfortunately, estimating p is difficult. One reasonable heuristic would be to compute the average maximal distance a user travels from his home between synchronizations, assume this distance to be the radius of a circular encounter region, multiply by the population density, multiply by the proportion of people in the area who use the service, and divide by the number of users in the system. Since such techniques may impose high user burden, users may simply select a widely-accepted conservative value and request that the server aggressively filter messages.

Filters

Lower values of p lead to greater user overhead in the number of non-decryptable messages that must be received to preserve k-anonymity. To combat this, the user may specify that she only wishes to receive messages from users residing within the same geographic region. Moreover, the user may request that the server filter messages on the basis of any number of other attributes the server knows about the

253 sender. This is especially appealing for missed-connections services. For example, a user may request to only receive messages from people of the opposite sex and within a certain age range. This is similar to the way users search for peers on popular dating websites.

Collusion Attacks

Although SMILE provides protection against channel snooping by the server or in- dividual clients we cannot defend against all client-server collusion attacks. If clients present for the encounter collude with the server, they can provide the server with the encounter key used for end-to-end encryption. Clients can also aid the server in deanonymizing attacks. To precisely determine a client’s identity, the server would need to collude with k ´ 1 of the client’s k randomly-selected anonymizing peers. Although this is unlikely, partial collusion weakens the anonymizing properties of the system. To compensate, if a user anticipates that the proportion of peers in collusion with the server is c, it should actually estimate k “ n ¨ p ¨ p1 ´ cq ¨ r ¨ d{2l, and thus select l as follows:

n ¨ p ¨ p1 ´ cq ¨ r ¨ d l “ log (8.1) Z 2 ˆ k ˙^

Attacks on Nonuniform Distribution

Note that the model we have presented thus far assumes that the distribution of messages is uniform across clients. In practice, we expect that some pairs of commu- nicating clients will deviate from the average conversation length. Furthermore, we expect that conversations will be synchronous. If a client Ca sends a message to Cb,

Ca generally will not send another message to Cb until Cb replies. This implies that a pair of communicating clients will send approximately equal numbers of messages to the same hash prefix, making the pair easier to identify.

254 To combat these attacks on nonuniform message distribution, SMILE requires that all conversations be of a fixed length f. Whenever a client chooses to send a new message, or replies to a message for the first time, the client commits to sending precisely f messages. All clients may choose f independently, but they must not select f to be conversation dependent. No more than f messages may be sent for the same encounter. If a conversation ends before f messages are sent, the client pads the conversation with dummy messages. A dummy message for key x is constructed as, tP pHpxq,lq; Eypmrqu , where mr is a random message and y is a random key. To prevent intra-conversation timing attacks, messages must be sent at regular intervals. When a conversation is initiated, f upload times are predetermined. A new message is not uploaded until the next upload time. If the user does not create a message before some upload time, a dummy message is sent instead. Limiting users to a bounded f messages per encounter should not substantially limit functionality within the missed-connections domain. Within a few messages, it is likely that users would be willing to forgo some mutual anonymity to negotiate an out-of-band channel, such as email or telephone.

Alternate Adversarial Model

These k-anonymity techniques are designed to protect against an adversarial server which employs active attacks to aggressively deanonymize user encounters. We may also consider a weaker adversarial model, where the server is not malicious, but is vulnerable to read/write attacks on its database. Such a server may provide accurate estimates of n and r directly to its users and aid in determining p. It would not collude, thereby setting c “ 0. Clients may compute the appropriate l trivially for any desired k. Note that the choice of random message delay d is still an important input to the selection of l from k, as it affects the anonymity of records stored in

255 the database. Similarly, users must still account for attacks on nonuniform message distribution by using a fixed conversation length f.

Messaging Overhead

For a given set of system parameters, we can compute the expected overhead of re- ceived messages. (8.2) provides the number of recipients for any message. Assuming that users optimally select l as in (8.1), the number of recipients does not depend on n. Discretized l selection (l “ tl˚u, l˚ optimal) is accounted as z Pr1, 2q.

n z ¨ k Number of recipients “ “ (8.2) 2l p ¨ p1 ´ cq ¨ r ¨ d

Multiplying by the average rate r at which users send messages to new peers, we see that the rate at which messages are received does not depend on the rate at which messages are injected into the system. (8.3) presents a formula for computing the ratio of the number of clients that receive a message as overhead to the total number of clients receiving the message. This is an upper bound, which assumes that all messages are intended for a single recipient. For encounters with multiple co-located peers, the number of clients for which the message is decryptable will increase (we do not consider such messages overhead).

Overhead n{2l ´ 1 p ¨ p1 ´ cq ¨ r ¨ d “ “ 1 ´ (8.3) proportion n{2l z ¨ k

In (8.4), we compute the rate of overhead messages received by a client. Under the assumption of one recipient per message, the average rate at which a user receives decryptable messages is equal to the average per-user sending rate r.

z ¨ k Overhead reception rate “ ´ r (8.4) p ¨ p1 ´ cq ¨ d

256 These results show that message overhead depends on z,k,p,c, and r, but is not directly affected by increases in the total number of system users n. As the rate r at which legitimate messages are injected into the system increases, both the proportion of messages received as overhead and absolute rate of received overhead messages decreases. Appealingly, efficiency increases with usage.

8.3.4 Implementation Considerations

Sensing Platform and Key Distribution

We envision SMILE running on mobile phones and laptops. Thus, compatibility with currently-deployed technology is an important consideration. Fortunately, readily- available wireless communication platforms can be used for key distribution. Blue- tooth, given its appropriate range («10m) and low power consumption, is an obvious choice. In Figure 8.5, we note that the effective range of Bluetooth is comparable to the observed distance in the vast majority of Craigslist posts. Note that there is a security versus performance tradeoff in wireless transmission range. Shorter ranges limit potential snooping attacks while higher ranges increase the probability of encounter detection. Other low-power radio platforms such as IEEE 802.15.4 ZigBee may be appro- priate, but we only considered solutions that are readily-available on commodity hardware. Given the tradeoffs, we believe that Bluetooth provides the most reason- able platform in widespread use. We have implemented an encounter detection and key sharing scheme based on Bluetooth discoverable-mode service advertisements. The use of service advertisements obviates the need for device pairing and avoids breaking compatibility with other concurrently-running Bluetooth services. WiFi (802.11) is another widely-available option. WiFi’s relatively larger range, and increased ability to penetrate walls, provides a poorer approximation of the

257 50

40

30

20

10 % Classified Encounters % Classified 0 Intimate Nearby Around Distant < 1m 1-5m 5-10m > 10m Estimated Physical Distance of Encounter

Figure 8.5: Estimated Craigslist encounter distance. Only «5% of encounters occur outside of Bluetooth range. type of co-location guarantee that Craigslist users desire from a missed-connections service. While transmission power control may effectively limit range to the desired distance, there are other practical problems. For example, key distribution would require all devices to share the same channel. The problem becomes simpler if we assume that nearby participants associate to the same WiFi access point, in which case broadcast packets can be used trivially. Alternatively, WiFi could be switched to ad-hoc mode, with keys broadcast as the network SSID. The primary drawbacks of this approach are the high power draw of 802.11-beacon scanning and the loss of Internet connectivity over WiFi. Advanced techniques exist for establishing a shared key over a wireless channel. Radio-telepathy [125] and [86] exploit the symmetry of time-varying wireless-channel fading to establish a shared key between a transmitter and receiver, and are invul- nerable to snooping by other local adversaries. While such techniques could be used in SMILE for key distribution, they are unnecessary for our attacker model. SMILE relies on only to prove that an encounter occurred with any co-located peer. All local peers are equally legitimate. In the case that multiple peers are

258 50

40

30

20

10 % Classified Encounters % Classified 0 Same Day 1-2 Days 3-6 Days 1+ Weeks 1+ Years Latency From Encounter to Post

Figure 8.6: Estimated latency from time of encounter occurrence to Craigslist post.

encountered simultaneously, messages sent using the corresponding key should be readable by all.

Storage Burden

Using Bluetooth as a key-distribution mechanism prevents SMILE from detecting extremely brief encounters because of its relatively slow service-scan speed («30s). As a result, clients will record at most two entries per minute, per encounter of the form shown in Figure 8.1. This provides an upper bound on the number of encounter keys recorded since packet loss will reduce the actual number. From these bounds, we define an encounter-time metric to measure the duration of a continual encounter with a single peer that can be supported by some quantity of storage. Using a 128-bit encounter key, a 32-bit pre-computed encounter-key hash as a computational optimization, a 32-bit timestamp, and a 32-bit latitude and longi- tude, each entry has a fixed 256-bit (32-byte) storage requirement. Assuming no additional local state is maintained, a client can maintain 524,288 encounter entries (182 encounter-days) using a conservative 16 MB of disk storage. From Figure 8.6, we see that, at the extreme end, missed-connections attempts rarely occur beyond a

259 period of one year. If we discard encounter data after one year, 16 MB is enough to sustain an average of 1,436 encounter entries (12 encounter-hours) per day. The server maintains a 32-bit client id, 32-bit hash prefix, and 8-bit hash prefix length, for a total of 72 bits per encounter entry. Since each encounter is recorded by both clients, 144 bits are used per unique encounter entry. The server should organize its database of new encounter entries in FIFO order. When storage limits are reached, the server should evict the oldest entries. 18 GB is enough server storage for a billion encounter entries, or more than one encounter-millennium. 500 GB provides enough storage for « 100 encounter entries for every person in the US. From these calculations, we do not expect storage overhead costs to limit system adoption.

8.4 Decentralized Architecture

In this section, we consider a decentralized scheme as an alternative to our centralized design. It provides similar privacy guarantees without requiring a dedicated server, but yields different tradeoffs. The primary drawback is that it requires the use of an anonymizing network to provide an untraceable messaging service between participants. For this purpose, an anonymous remailer (e.g., Mixmaster [132] or Mixminion [54]) or general onion routing (e.g., Tor [56]) may be used. Note that this work is also complementary to more advanced anonymized messaging techniques, such as information slicing [94]. The anonymized messaging channel protects peer identities from a malicious third party. Our k-anonymizing techniques extend this protection to allow bidirectional communication without revealing identities to peers engaged in the exchange. We consider our server-based design more practical in terms of deployability and messaging overhead, but present this alternative to provide a more complete per- spective on the design space. The distributed approach extends work we presented

260 Client Encounter State x x' 7:29pm 35.97° -78.95° ...

Key x, x' {x, } Shared {x', }

??? ??? "Red dress?" Ex("Red dress?" || t ); H(x) Onion ??? Routing ???

Figure 8.7: Distributed scheme operation. During an encounter, each peer shares k identifiers and an encounter key. Messages are sent using onion routing or an anonymous remailer to preserve anonymity. previously [122]. From our prior design, we have removed the need for a coordinating server, which eliminates a number of location privacy vulnerabilities at the cost of higher per-encounter wireless transmission requirements. Messaging overheads are equivalent. Rather of focusing on these improvements, this discussion highlights dif- ferences from our centralized scheme. Where details are omitted, the techniques are similar to those used in SMILE. Figure 8.7 illustrates the decentralized scheme.

8.4.1 Distributed Operation

Each participant chooses a unique personal identifier to enable peer-to-peer commu- nication (e.g., an email address, instant-messaging screen name, domain name, or IP address). To maintain their anonymity, users should choose an identifier that cannot be mapped to their actual identity. Users must also use an anonymizing net- work for all communication. To simplify our discussion, we assume that users choose

261 email-address identifiers, and that all communication is handled by an anonymous remailer, such as Mixmaster.

Anonymity

In addition to a personal identifier, users maintain another set of identifiers, any of which could plausibly be under their control. Users preserve their k-anonymity by sending messages through an anonymizing mix network with the source specified as a tuple of plausible identifiers called an identifier set. A tuple of size k only reveals that one or more of the k identifiers was present at the encounter. Encounter privacy is a function of the plausibility that a message could have come from any member of the tuple with equal probability.

Anonymous Messaging

As in the centralized case, our messaging scheme provides a channel that is end-to- end confidential and resistant to man-in-the-middle attacks. During an encounter at time t, peers use wireless transmissions to exchange messages of the form tI,xu where I “ ti1, i2, . . . , iku is the source peer’s identifier set and x is an encounter

key. Later, assume peer Pa with Ia “ ta1, a2, . . . , aku wishes to send a message

m to previously-encountered peer Pb with Ib “ tb1,b2,...,bku. Pa sends an email

containing tHpxq; ExpIa||t||mqu through an anonymous remailer to all identifiers in

1 Ib. Pb may reply with tHpxq; ExpIb||t ` 1||m qu to all ai P Ia. Note the use of an incremented timestamp nonce prevents a variety of replay attacks. Receiving peers not actually present for the incident of encounter will not be able to decrypt the message or the contained identifier set, thereby gaining no information.

8.4.2 Identifier Set Selection

The privacy properties of the decentralized scheme are tied to the quality of a user’s identifier set. Poor selections may allow adversaries to establish a mapping between

262 an identifier and the source of an anonymized message, and, as a result, reveal the identify of a previously-encountered peer.

Identifier Collection

Before an identifier set is selected, users should collect at least k other identifiers. A user’s device can build a database of identifiers by recording the identifier sets of other participants. Collecting identifier sets this way is appealing because it promotes identifier dispersion: the more widespread a user’s identifier is, the more likely it is to be used in other users’ identifier sets. Broad re-use of an identifier makes the corresponding user’s actual locations more difficult to infer due to the increased number of false positives.

Collusion Attacks

As in the centralized scheme, collusion can weaken anonymity guarantees. For each of a message sender’s k ´ 1 anonymizing identifiers in collusion with the message recipient, k is effectively decreased by one. Given that an adversarial peer (encoun- tered or otherwise) has no way to control the contents of another user’s identifier set, this would be a difficult attack.

Geographic Plausibility

Given that a message recipient will have some knowledge of a sender’s general where- abouts (e.g., from the time and place of the encounter itself), it may be possible to pinpoint out-of-place identifiers. To increase the plausibility of peer identifiers used in the set, it is desirable to maintain some general locality with the encounter. This may be accomplished by discarding known identifiers after an expiration period, as- suming that recently-acquired identifiers will tend to correspond to users located in the general vicinity.

263 Bootstrapping

The maximum size of an identifier set is limited by the number of peers of which a client is aware. In cases where an insufficient number is known, it may be helpful to introduce additional false identifiers. If an adversarial peer is likely to have trouble distinguishing legitimate from contrived identifiers, this may provide additional pro- tection. Unfortunately, not all false identifiers are equally plausible. For example, we assume email address used directly as identifiers. Plausible fake email addresses should be human readable (which would be feasible through dictionary-based random generation) and not cause mail to bounce.

Slow Evolution

Once a user has collected enough plausible identifiers, she must select an appropri- ate subset for use during encounters. First, use of newly-added identifiers should be delayed, so that they are not reused in a time-linkable reencounter with their source. Moreover, a user’s identifier set should change slowly over time to prevent an adver-

sary from linking multiple encounters. To see why, assume Pa shares I1 at time t1

and I2 at time t2. Pb encounters Pa at t1 and Pc at t2 and able to guess that Pc “ Pa through external information. For example, if Pa is the only other person around at both times, the link is clear. Now, for both t1 and t2, Pb can deduce that the true

n identifier of Pa is in I1 X I2, or more generally, Xi“1Ii for n encounters. This attack is most difficult to prevent for adversaries encountered on a regular basis. However, such a well-positioned adversary can likely deduce personal information more easily from other means.

Identifier-Set Size

Users must carefully select an appropriate identifier set size s. Clearly, s must be greater than k to preserve k-anonymity. In practice, due to uncertainty over the

264 100 Instantaneous (<15 seconds) Short (15 seconds − 1 minute) 80 Extended (>1 minute)

60

40

20 % Classified Encounters % Classified

0 SF Bay Area Atlanta NYC

Figure 8.8: Estimated encounter duration implied by Craigslist posts, by geo- graphic locale. above identifier inclusion criteria, participants should select s ě k{p1 ´ jq, where j represents the probability that an adversarial peer can reject an included identifier as less plausible than the user’s true identifier. Practical considerations, such as reasonable message length or an insufficient number of known peers, may bound s below the ideal selection. As in the centralized scheme, there is a privacy-versus- overhead tradeoff in choosing an identifier-set size.

8.5 Evaluation

In this section, we consider deployment feasibility for SMILE, including (1) the ability of our passive key-exchange protocol to adequately establish shared-key state among encountered clients, and (2) the appropriateness of our system properties for real- world missed-connections usage.

8.5.1 Key Advertisement Detection

The feasibility of SMILE depends on its ability to reliably detect encounters within a potentially short amount of co-location time, since co-location proofs can only be provided with shared-key state. The shorter this minimum duration is, the more widely applicable our scheme will be.

265 Target Applications

Long-duration detection is sufficient for activities such as shared meals, a conversa- tion over coffee, or mutual attendance at a seminar. Detecting short events, such as a quick hallway passing, requires a faster exchange. Romantic queries, business propositions, or friend-seeking searches may be the result of encounters that are only tens of seconds long. Figure 8.8 shows that less than 10% of the Craigslist encounters in our study lasted 15 seconds or less.

Bluetooth Detection Test

In our implementation, client devices periodically initiate available service scans for all Bluetooth devices in range. Service names identified as keys are recorded in a local relational database, as are all self-advertised keys, along with the current time and coordinates, as determined by Skyhook WiFi-based localization. After completing a scan, the client pauses, chooses a new key to advertise, and then initiates the next scan. In Figure 8.9, we show reliability results for co-location detection and key exchange. In these experiments one client remained stationary in a room while the other client started out of range, walked into the room, remained stationary for the specified interval, and then exited. We categorized detection as mutual (when both keys received by both clients), partial (when one key received by one client), or failed (when neither client received a key). Our protocol only requires partial detection, since the failing client will have recorded the broadcast key and successfully shared it with the other client. Our results show that Bluetooth key advertisement and scanning can reliably detect encounters at timescales of 30-60 seconds, which is acceptable for the vast majority of encounters we found on Craigslist (Figure 8.8). For this test, we selected a pause period of 15 seconds from the end of one scan to the start of the next. Given the speed of detection, it may be preferable to extend this interval for a corresponding reduction

266 100 No Detection (failure) Partial Detection (sufficient) 80 Full Detection (bidirectional)

60

40

20 % Detected Encounters % Detected

0 5 10 20 30 45 60 Encounter Duration (seconds)

Figure 8.9: Encounter-key discovery. Each detection scan begins 15 seconds after the completion of the prior scan. in energy consumption, while still meeting application-appropriate detection speed requirements.

8.5.2 Craigslist Classification

To ground our assumptions of how people might use SMILE, we examined hundreds of Craigslist posts from US-metro areas and classified the described encounters by identity-confirmation checks (Figure 8.4), distance between the to-be-reconnected individuals (Figure 8.5), the time between encounter and posting (Figure 8.6), and encounter duration (Figure 8.8). Each post’s content was manually classified. We examined Craigslist missed-connections posts from a number of US-metro areas including Chicago, New York, Philadelphia, Raleigh, San Diego, Seattle, the San Francisco Bay Area, and Washington DC. We ignored sub-classification by geo- graphic district or gender-based filters. Posts were selected systematically: beginning with the most recent posts, we examined each post in reverse chronological order un- til at least 100 legitimate posts had been classified. By its open, bulletin-board nature, Craigslist is prone to some misuse and abuse, including spam, incendiary rants, and cryptic language. Ambiguous or indecipherable posts, posts that did not

267 represent a missed connection, and posts with obscene content were not considered. All Craigslist figures in this paper present a histogram of the proportion of legitimate posts within each category. The scope of our Craigslist study was limited and our methodology was not scientifically rigorous, but the data we collected provides valu- able initial insight into the challenges that emerging systems such as SMILE would face if deployed.

8.6 Related Work

SMILE lies at the intersection of three research areas: location proofs, location privacy, and anonymized communication.

8.6.1 Location Proofs

Several systems have recently sought to give end users the ability to prove that they were in a particular place at a particular time. [162] proposed a solution that is suitable for third-party attestation, but relies on a PKI and changes to the 802.11 access-point infrastructure. SMILE takes a more ad-hoc approach and requires no changes to existing infrastructure, but generates proofs that only demonstrate a mutual encounter. In [103], the authors describe a secure localization service that can be used to generate unforgeable geotags for mobile content such as photos and video. The primary difference between this work and ours is that it relies on the wide deployment of secure infrastructure to generate proofs, while we rely on users to prove that an encounter occurred. SPATE [107] is similar to SMILE in its ad-hoc design and also uses physical encounters to allow users to establish private communication channels. SPATE was designed with the assumption that its users already know each other, and at the time of their physical encounter intend to communicate sometime in the future. Our

268 design assumes its users do not know each other, and provides a mechanism to communicate in retrospect of an encounter, while maintaining anonymity. In a preliminary version of this work [122], we targeted missed-connections and utilized similar wireless techniques to prove when an encounter occurred. However, this service was prone to linking attacks by malicious servers since users reveal their actual location information to the service provider. SMILE avoids such attacks by forwarding key hashes to the server instead.

8.6.2 Location Privacy

A number of projects have investigated the use of trusted central servers to anonymize location information, especially to meet k-anonymity requirements [71, 91]. Our approach provides similar k-anonymity guarantees, but without requiring that the third-party service be trustworthy. Other work attempts to provide location privacy though access-control mechanisms [78] and digital rights management [73]. Both models rely on a trusted server to manage users’ location information. Adeona [157] is a device-tracking service designed to help users recover lost or stolen mobile devices without compromising their location privacy. Like SMILE, Adeona uses pseudo-random generators to name and encrypt users’ location infor- mation. The key difference between the two services is the way that they compute location identifiers. The location identifiers generated by an Adeona-enabled device only need to be meaningful to the individual device owner since only she is allowed to track her device. On the other hand, SMILE must allow independent, co-located users to deterministically compute the same encounter identifier without revealing any information about the encounter’s place or time to entities that were not present. Finally, SmokeScreen [51] is a mobile social service that uses short-range wire- less messages among co-located users to enable “presence-sharing.” Smokescreen is primarily meant to enable privacy-preserving presence-sharing among users with

269 pre-established trust relationships, and relies on centralized, trusted brokers to co- ordinate anonymous communication between strangers. SMILE allows co-located strangers to communicate without revealing their location or mutual interest to ser- vice providers.

8.6.3 Anonymous Messaging

Anonymous remailers [132, 54] and general onion routing [56] provide a communi- cation channel that is anonymous to third-party adversaries. Information slicing [94] provides similar guarantees, but without need for public-key cryptography. Our techniques provide an additional level of privacy, where even those individuals par- ticipating in a message exchange maintain mutual k-anonymity.

8.7 Conclusion

This paper has described the SMILE mobile social service. SMILE aims to provide an efficient missed-connections service using mobile devices without relying on trusted coordinating servers or pre-established trust among users. To meet this goal, SMILE relies on trust derived from physical encounters among users. Co-located SMILE de- vices establish trust with each other by performing a passive key-exchange protocol that can be used to generate a proof of their encounter. Clients only need to share hashes of their logged encounter keys with the SMILE server since hashes protect users’ location and encounter privacy from malicious servers and peers. Through protocol analysis, study of the Craigslist missed-connections service, and experimen- tation with a prototype SMILE implementation, we demonstrated the strong privacy guarantees and feasibility of SMILE.

270 9 Conclusion

It is nontrivial to leverage the fundamentally unique and powerful capabilities of mobile platforms. New foundations can help bridge the gap from the mobile apps we have today to the sophistication that we will expect tomorrow. This dissertation does not attempt to consider all the necessary building blocks for the construction of the next generation of mobile applications; many more will be required. From a bottom-up perspective, this thesis focuses on the quality of Wi-Fi wireless net- work connectivity. Various other core components might be optimized in terms of performance, reliability, energy efficiency, or functionality, including cellular com- munication, localization services, sensors, processor design, utilization of cloud and peer resources, software interfaces for human-computer interaction, displays, voice recognition, biometrics, security services, etc. From an application-driven top-down perspective, it is clearly impossible to enumerate all potential mobile applications. Instead, we consider a cross-section of a few useful and novel enhancements. This dissertation leaves many important building blocks as future research. However, in its contributions of defining, designing, implementing, and evaluating a selection of these primitives, it advance the state-of-the-art in mobile computing and partially

271 serves to ensure that the mobile app stores of the future will remain forums for innovation, excitement, and utility.

272 Bibliography

[1] Cell site. http://en.wikipedia.org/wiki/Cell_site#Range.

[2] Census 2000 U.S. Gazetteer Files. http://www.census.gov/geo/www/ gazetteer/places2k.html.

[3] Clustering Analysis. http://en.wikipedia.org/wiki/Cluster_analysis.

[4] Federal Communications Commission Geographic Information Systems. http: //wireless.fcc.gov/geographic/index.htm.

[5] The madwifi project. http://madwifi-project.org/.

[6] The network simulator - ns-2. http://www.isi.edu/nsnam/ns/.

[7] Wireless LAN medium access control (MAC) and physical layer (PHY) speci- fications. IEEE Std 802.11, 2007.

[8] Gregory D. Abowd, Gillian R. Hayes, Giovanni Iachello, Julie A. Kientz, Shwetak N. Patel, Molly M. Stevens, and Khai N. Truong. Prototypes and paratypes: Designing mobile and applications. IEEE Pervasive Computing, 4:67–73, October 2005.

[9] Sharad Agarwal and Jacob R. Lorch. Matchmaking for online games and other latency-sensitive p2p systems. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication, SIGCOMM ’09, pages 315–326, New York, NY, USA, 2009. ACM.

[10] Yuvraj Agarwal, Ranveer Chandra, Alec Wolman, Paramvir Bahl, Kevin Chin, and Rajesh Gupta. Wireless wakeups revisited: energy management for voip over wi-fi smartphones. In Proceedings of the 5th international conference on Mobile systems, applications and services, MobiSys ’07, pages 179–191, New York, NY, USA, 2007. ACM.

[11] N. Ahmed and S. Keshav. Smarta: a self-managing architecture for thin access points. In Proceedings of the 2006 ACM CoNEXT conference, CoNEXT ’06, pages 9:1–9:12, New York, NY, USA, 2006. ACM.

273 [12] Nabeel Ahmed, Vivek Shrivastava, Arunesh Mishra, Suman Banerjee, Srini- vasan Keshav, and Konstantina Papagiannaki. Interference mitigation in en- terprise wlans through speculative scheduling. In Proceedings of the 13th annual ACM international conference on Mobile computing and networking, MobiCom ’07, pages 342–345, New York, NY, USA, 2007. ACM.

[13] Aditya Akella, Glenn Judd, Srinivasan Seshan, and Peter Steenkiste. Self- management in chaotic wireless deployments. In Proceedings of the 11th annual international conference on Mobile computing and networking, MobiCom ’05, pages 185–199, New York, NY, USA, 2005. ACM.

[14] Ian F. Akyildiz and Wenye Wang. The predictive user mobility profile frame- work for wireless multimedia networks. IEEE/ACM Transactions on Network- ing, 12:1021–1035, December 2004.

[15] Alexa Internet, Inc. Alexa the web information company. http://www.alexa. com/.

[16] Manish Anand, Edmund B. Nightingale, and Jason Flinn. Self-tuning wireless network power management. In Proceedings of the 9th annual international conference on Mobile computing and networking, MobiCom ’03, pages 176– 189, New York, NY, USA, 2003. ACM.

[17] Ganesh Ananthanarayanan and Ion Stoica. Blue-fi: enhancing wi-fi perfor- mance using bluetooth signals. In Proceedings of the 7th international confer- ence on Mobile systems, applications, and services, MobiSys ’09, pages 249– 262, New York, NY, USA, 2009. ACM.

[18] Nate Anderson. Slow internet meets its waterloo as 105mbps comes to iowa. Ars Technica, http://arstechnica.com/tech-policy/news/2009/12/ fastest-us-internet-waterloo-ia.ars, December 2009.

[19] Trevor Armstrong, Olivier Trescases, Cristiana Amza, and Eyal de Lara. Effi- cient and transparent dynamic content updates for mobile clients. In Proceed- ings of the 4th international conference on Mobile systems, applications and services, MobiSys ’06, pages 56–68, New York, NY, USA, 2006. ACM.

[20] AT&T Wireless. Wireless IP options for mobile deployments. http://www. wireless.att.com, December 2010.

[21] P. Bahl, J. Padhye, L. Ravindranath, M. Singh, A. Wolman, and B. Zill. DAIR: A framework for managing enterprise wireless networks using desktop infras- tructure. In HotNets IV, 2005.

[22] Paramvir Bahl, Ranveer Chandra, Jitendra Padhye, Lenin Ravindranath, Man- preet Singh, Alec Wolman, and Brian Zill. Enhancing the security of corporate

274 wi-fi networks using dair. In Proceedings of the 4th international conference on Mobile systems, applications and services, MobiSys ’06, pages 1–14, New York, NY, USA, 2006. ACM.

[23] Mahesh Balakrishnan, Iqbal Mohomed, and Venugopalan Ramasubramanian. Where’s that phone?: geolocating ip addresses on 3g networks. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, IMC ’09, pages 294–300, New York, NY, USA, 2009. ACM.

[24] Aruna Balasubramanian, Ratul Mahajan, and Arun Venkataramani. Augment- ing mobile 3G using WiFi. In Proceedings of the 8th international conference on Mobile systems, applications, and services, MobiSys ’10, pages 209–222, New York, NY, USA, 2010. ACM.

[25] Niranjan Balasubramanian, Aruna Balasubramanian, and Arun Venkatara- mani. Energy consumption in mobile phones: a measurement study and impli- cations for network applications. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, IMC ’09, pages 280–293, New York, NY, USA, 2009. ACM.

[26] L. Bao and S. Intille. Activity Recognition from User-Annotated Acceleration Data. In Percom, 2002.

[27] Xuan Bao and Romit Roy Choudhury. Movi: mobile phone based video high- lights via collaborative sensing. In Proceedings of the 8th international confer- ence on Mobile systems, applications, and services, MobiSys ’10, pages 357– 370, New York, NY, USA, 2010. ACM.

[28] N.E. Baughman and B.N. Levine. Cheat-proof playout for centralized and distributed online games. In INFOCOM 2001. Twentieth Annual Joint Con- ference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 1, pages 104 –113 vol.1, 2001.

[29] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. In ECCV, 2006.

[30] P. Bellaria. Message from the iPad: Heavy Traffic Ahead. http://blog. broadband.gov/.

[31] Y.W. Bernier. Latency compensating methods in client/server in-game proto- col design and optimization. 2001.

[32] J.C. Bicket. Bit-rate selection in wireless networks. Master’s thesis, Mas- sachusetts Institute of Technology, 2005.

[33] Broadcom. BCM4329 product brief. http://www.broadcom.com.

275 [34] Micah Z. Brodsky and Robert T. Morris. In defense of wireless carrier sense. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication, SIGCOMM ’09, pages 147–158, New York, NY, USA, 2009. ACM. [35] Paul Carton and Jean Crumrine. New smart phone owners tell us what they really think. ChangeWave Research, http://www.changewaveresearch.com/ articles/2010/05/smart_phones_20100525.html, May 2010. [36] Mun Choon Chan and Ramachandran Ramjee. Tcp/ip performance over 3g wireless links with rate and delay variation. In Proceedings of the 8th annual international conference on Mobile computing and networking, MobiCom ’02, pages 71–82, New York, NY, USA, 2002. ACM. [37] Ranveer Chandra, Jitendra Padhye, Alec Wolman, and Brian Zill. A location- based management system for enterprise wireless lans. In Proceedings of the 4th USENIX conference on Networked systems design & implementation, NSDI’07, pages 9–9, Berkeley, CA, USA, 2007. USENIX Association. [38] H. Chang, V. Misra, and D. Rubenstein. A general model and analysis of physical layer capture in 802.11 networks. In INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, pages 1 –12, april 2006. [39] B. B. Chen and M. C. Chan. MobTorrent: A Framework for Mobile Internet Access from Vehicles. In Infocom, 2009. [40] B. X. Chen. What is Wrong with 3G in iPhone 3G? http://www.wired.com/ gadgetlab/2008/08/whats-wrong-wit/. [41] Ching Ling Tom Chen. Distributed collision detection and resolution. Master’s thesis, McGill University, May 2010. [42] David M. Chen, Sam S. Tsai, Bernd Girod, Cheng-Hsin Hsu, Kyu-Han Kim, and Jatinder Pal Singh. Building book inventories using smartphones. In Proceedings of the international conference on Multimedia, MM ’10, pages 651– 654, New York, NY, USA, 2010. ACM. [43] Xu Cheng, C. Dale, and Jiangchuan Liu. Statistics and social network of youtube videos. In Quality of Service, 2008. IWQoS 2008. 16th International Workshop on, pages 229 –238, june 2008. [44] Yu-Chung Cheng, Mikhail Afanasyev, Patrick Verkaik, P´eter Benk¨o, Jennifer Chiang, Alex C. Snoeren, Stefan Savage, and Geoffrey M. Voelker. Automat- ing cross-layer diagnosis of enterprise wireless networks. In Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’07, pages 25–36, New York, NY, USA, 2007. ACM.

276 [45] Yu-Chung Cheng, John Bellardo, P´eter Benk¨o, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Savage. Jigsaw: solving the puzzle of enterprise 802.11 analysis. In Proceedings of the 2006 conference on Applications, technolo- gies, architectures, and protocols for computer communications, SIGCOMM ’06, pages 39–50, New York, NY, USA, 2006. ACM.

[46] Cisco Systems, Inc. The benefits of centralization in wireless LANs. http://www.cisco.com/en/US/prod/collateral/wireless/ps5678/ ps6521/prod_white_paper0900aecd8040f7b2.pdf, 2006.

[47] Cisco Systems, Inc. Cisco visual networking index: Global mobile data traffic forecast update, 2010–2015. White Paper, 2011.

[48] Brian Clarkson, Alex Pentland, and Kenji Mase. Recognizing user context via wearable sensors. In Proceedings of the 4th IEEE International Symposium on Wearable Computers, ISWC ’00, pages 69–, Washington, DC, USA, 2000. IEEE Computer Society.

[49] Mark Claypool and Kajal Claypool. Latency and player actions in online games. Communications of the ACM, 49:40–45, November 2006.

[50] Sunny Consolvo, Predrag Klasnja, David W. McDonald, Daniel Avrahami, Jon Froehlich, Louis LeGrand, Ryan Libby, Keith Mosher, and James A. Landay. Flowers or a robot army?: encouraging awareness & activity with personal, mo- bile displays. In Proceedings of the 10th international conference on Ubiquitous computing, UbiComp ’08, pages 54–63, New York, NY, USA, 2008. ACM.

[51] Landon P. Cox, Angela Dalton, and Varun Marupadi. Smokescreen: flexible privacy controls for presence-sharing. In Proceedings of the 5th international conference on Mobile systems, applications and services, MobiSys ’07, pages 233–245, New York, NY, USA, 2007. ACM.

[52] Gregory Cuellar, Dean Eckles, and Mirjana Spasojevic. Photos for information: a field study of cameraphone computer vision interactions in tourism. In CHI ’08 extended abstracts on Human factors in computing systems, CHI EA ’08, pages 3243–3248, New York, NY, USA, 2008. ACM.

[53] Frank Dabek, Russ Cox, Frans Kaashoek, and Robert Morris. Vivaldi: a de- centralized network coordinate system. In Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer com- munications, SIGCOMM ’04, pages 15–26, New York, NY, USA, 2004. ACM.

[54] G. Danezis, R. Dingledine, and N. Mathewson. Mixminion: Design of a type iii anonymous remailer protocol. In Security and Privacy, 2003. Proceedings. 2003 Symposium on, pages 2–15. IEEE, 2003.

277 [55] Pralhad Deshpande, Xiaoxiao Hou, and Samir R. Das. Performance comparison of 3g and metro-scale wifi for vehicular network access. In Proceedings of the 10th annual conference on Internet measurement, IMC ’10, pages 301–307, New York, NY, USA, 2010. ACM.

[56] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: the second- generation onion router. In Proceedings of the 13th conference on USENIX Security Symposium - Volume 13, SSYM’04, pages 21–21, Berkeley, CA, USA, 2004. USENIX Association.

[57] Marcel Dischinger, Andreas Haeberlen, Krishna P. Gummadi, and Stefan Saroiu. Characterizing residential broadband networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC ’07, pages 43–56, New York, NY, USA, 2007. ACM.

[58] Fahad R. Dogar, Peter Steenkiste, and Konstantina Papagiannaki. Catnap: exploiting high bandwidth wireless interfaces to save energy for mobile de- vices. In Proceedings of the 8th international conference on Mobile systems, applications, and services, MobiSys ’10, pages 107–122, New York, NY, USA, 2010. ACM.

[59] M. Durvy, O. Dousse, and P. Thiran. Modeling the 802.11 protocol under different capture and sensing capabilities. In INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, pages 2356 –2360, May 2007.

[60] Nathan Eagle and Alex Pentland. Social serendipity: Mobilizing social soft- ware. IEEE Pervasive Computing, 4:28–34, April 2005.

[61] Hossein Falaki, Dimitrios Lymberopoulos, Ratul Mahajan, Srikanth Kandula, and Deborah Estrin. A first look at traffic on smartphones. In Proceedings of the 10th annual conference on Internet measurement, IMC ’10, pages 281–287, New York, NY, USA, 2010. ACM.

[62] Hossein Falaki, Ratul Mahajan, Srikanth Kandula, Dimitrios Lymberopoulos, Ramesh Govindan, and Deborah Estrin. Diversity in smartphone usage. In Proceedings of the 8th international conference on Mobile systems, applications, and services, MobiSys ’10, pages 179–194, New York, NY, USA, 2010. ACM.

[63] M.A. Fischler and R.C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.

[64] M.R. Garey, D.S. Johnson, et al. Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman, 1979.

278 [65] Steven Gargolinski, Christopher St. Pierre, and Mark Claypool. Game server selection for multiple players. In Proceedings of 4th ACM SIGCOMM workshop on Network and system support for games, NetGames ’05, pages 1–6, New York, NY, USA, 2005. ACM.

[66] Gartner, Inc. Gartner says worldwide mobile gaming revenue to grow 19 per- cent in 2010. Gartner Press Release, http://www.gartner.com/it/page.jsp? id=1370213, May 2010.

[67] Phillipa Gill, Martin Arlitt, Zongpeng Li, and Anirban Mahanti. Youtube traffic characterization: a view from the edge. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC ’07, pages 15–28, New York, NY, USA, 2007. ACM.

[68] Shyamnath Gollakota and Dina Katabi. Zigzag decoding: combating hidden terminals in wireless networks. In Proceedings of the ACM SIGCOMM 2008 conference on Data communication, SIGCOMM ’08, pages 159–170, New York, NY, USA, 2008. ACM.

[69] Google Mobile. Latitude. http://www.google.com/latitude/.

[70] Ben Greenstein, Damon McCoy, Jeffrey Pang, Tadayoshi Kohno, Srinivasan Seshan, and David Wetherall. Improving wireless privacy with an identifier- free link layer protocol. In Proceedings of the 6th international conference on Mobile systems, applications, and services, MobiSys ’08, pages 40–53, New York, NY, USA, 2008. ACM.

[71] Marco Gruteser and Dirk Grunwald. Anonymous usage of location-based ser- vices through spatial and temporal cloaking. In Proceedings of the 1st inter- national conference on Mobile systems, applications and services, MobiSys ’03, pages 31–42, New York, NY, USA, 2003. ACM.

[72] Marco Gruteser and Dirk Grunwald. Enhancing location privacy in wireless lan through disposable interface identifiers: a quantitative analysis. Mobile Networks and Applications, 10:315–325, June 2005.

[73] C.A. Gunter, M.J. May, and S.G. Stubblebine. A formal privacy system and its application to location based services. In Privacy Enhancing Technologies (PET), 2004.

[74] Halo 3 Forum. Average multiplayer game length. http://www.bungie.net, February 2010.

[75] Dongsu Han, Aditiya Agarwala, David G. Andersen, Michael Kaminsky, Kon- stantina Papagiannaki, and Srinivasan Seshan. Mark-and-sweep: getting the “inside” scoop on neighborhood networks. In Proceedings of the 8th ACM

279 SIGCOMM conference on Internet measurement, IMC ’08, pages 99–104, New York, NY, USA, 2008. ACM.

[76] Doug Hanchard. FCC Chairman forecasts wireless spectrum crunch. http: //government.zdnet.com/?p=7401, February 2010.

[77] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning (2nd ed.). In Springer, 2009.

[78] C. Hauser and M. Kabatnik. Towards Privacy Support in a Global Location Service. In Proc. of the IFIP Workshop on IP and ATM Traffic Management, 2001.

[79] J. Hays and A.A. Efros. Im2gps: estimating geographic information from a single image. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8, June 2008.

[80] L. Heyer, S. Kruglyak, and S. Yooseph. Exploring expression data: Identifica- tion and analysis of coexpressed genes. In Genome Research, 1999.

[81] Harlan Hile, Alan Liu, Gaetano Borriello, Radek Grzeszczuk, Ramakrishna Vedantham, and Jana Kosecka. Visual navigation for mobile devices. IEEE MultiMedia, 17:16–25, April 2010.

[82] H. Holma and A. Toskala. WCDMA for UMTS – HSPA evolution and LTE. In WILEY, Fourth Edition 2008.

[83] Junxian Huang, Qiang Xu, Birjodh Tiwana, Z. Morley Mao, Ming Zhang, and Paramvir Bahl. Anatomizing application performance differences on smart- phones. In Proceedings of the 8th international conference on Mobile systems, applications, and services, MobiSys ’10, pages 165–178, New York, NY, USA, 2010. ACM.

[84] Ilog, Inc. Solver cplex, 2003.

[85] Kyle Jamieson, Bret Hull, Allen Miu, and Hari Balakrishnan. Understanding the real-world performance of carrier sense. In Proceedings of the 2005 ACM SIGCOMM workshop on Experimental approaches to wireless network design and analysis, E-WIND ’05, pages 52–57, New York, NY, USA, 2005. ACM.

[86] Suman Jana, Sriram Nandha Premnath, Mike Clark, Sneha K. Kasera, Neal Patwari, and Srikanth V. Krishnamurthy. On the effectiveness of secret key extraction from wireless signal strength in real environments. In Proceedings of the 15th annual international conference on Mobile computing and networking, MobiCom ’09, pages 321–332, New York, NY, USA, 2009. ACM.

280 [87] K.Y. Jang, M. Carrera, K. Psounis, and R. Govindan. Passive on-line in-band interference inference in centralized WLANs. Technical report, University of Southern , 2010.

[88] Tao Jiang, Helen J. Wang, and Yih-Chun Hu. Preserving location privacy in wireless lans. In Proceedings of the 5th international conference on Mobile systems, applications and services, MobiSys ’07, pages 246–257, New York, NY, USA, 2007. ACM.

[89] John Leyden. Teen hack suspects charged over myspace extortion bid. http:// www.theregister.co.uk/2006/05/25/myspace_hack_charges/, May 2006. The Register.

[90] Glenn Judd and Peter Steenkiste. Using emulation to understand and im- prove wireless networks and applications. In Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2, NSDI’05, pages 203–216, Berkeley, CA, USA, 2005. USENIX Association.

[91] Panos Kalnis, Gabriel Ghinita, Kyriakos Mouratidis, and Dimitris Papadias. Preventing location-based identity inference in anonymous spatial queries. IEEE Transactions on Knowledge and Data Engineering, 19:1719–1733, De- cember 2007.

[92] R.S. Kaminsky, N. Snavely, S.M. Seitz, and R. Szeliski. Alignment of 3d point clouds to overhead images. In Computer Vision and Pattern Recognition Work- shops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, pages 63–70, June 2009.

[93] F. Kammer, T. Tholey, and H. Voepel. Approximation algorithms for intersec- tion graphs. In Approximation, Randomization and Combinatorial Optimiza- tion Algorithms and Techniques, 2010.

[94] Sachin Katti, Jeff Cohen, and Dina Katabi. Information slicing: anonymity using unreliable overlays. In Proceedings of the 4th USENIX conference on Networked systems design & implementation, NSDI’07, pages 4–4, Berkeley, CA, USA, 2007. USENIX Association.

[95] Sachin Katti, Hariharan Rahul, Wenjun Hu, Dina Katabi, Muriel M´edard, and Jon Crowcroft. Xors in the air: practical wireless network coding. In Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’06, pages 243–254, New York, NY, USA, 2006. ACM.

[96] M. Kim, D. Kotz, and S. Kim. Extracting a mobility model from real user traces. In INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, pages 1–13, April 2006.

281 [97] Tae-Suk Kim, Hyuk Lim, and Jennifer C. Hou. Improving spatial reuse through tuning transmit power, carrier sense threshold, and data rate in multihop wire- less networks. In Proceedings of the 12th annual international conference on Mobile computing and networking, MobiCom ’06, pages 366–377, New York, NY, USA, 2006. ACM.

[98] Andrzej Kochut, Arunchandar Vasan, A. Udaya Shankar, and Ashok Agrawala. Sniffing out the correct physical layer capture model in 802.11b. In Proceedings of the 12th IEEE International Conference on Network Protocols, pages 252– 261, Washington, DC, USA, 2004. IEEE Computer Society.

[99] Ronny Krashinsky and Hari Balakrishnan. Minimizing energy for wireless web access with bounded slowdown. In Proceedings of the 8th annual international conference on Mobile computing and networking, MobiCom ’02, pages 119–130, New York, NY, USA, 2002. ACM.

[100] Jonathan Ledlie, Paul Gardner, and Margo Seltzer. Network coordinates in the wild. In Proceedings of the 4th USENIX conference on Networked systems design & implementation, NSDI’07, pages 22–22, Berkeley, CA, USA, 2007. USENIX Association.

[101] Jeongkeun Lee, Wonho Kim, Sung-Ju Lee, Daehyung Jo, Jiho Ryu, Taekyoung Kwon, and Yanghee Choi. An experimental study on the capture effect in 802.11a networks. In Proceedings of the second ACM international workshop on Wireless network testbeds, experimental evaluation and characterization, WinTECH ’07, pages 19–26, New York, NY, USA, 2007. ACM.

[102] Jeongkeun Lee, Sung-Ju Lee, Wonho Kim, Daehyung Jo, Taekyoung Kwon, and Yanghee Choi. Rss-based carrier sensing and interference estimation in 802.11 wireless networks. In Sensor, Mesh and Ad Hoc Communications and Networks, 2007. SECON ’07. 4th Annual IEEE Communications Society Con- ference on, pages 491–500, June 2007.

[103] Vincent Lenders, Emmanouil Koukoumidis, Pei Zhang, and Margaret Martonosi. Location-based trust for mobile user-generated content: applica- tions, challenges and implementations. In Proceedings of the 9th workshop on Mobile computing systems and applications, HotMobile ’08, pages 60–64, New York, NY, USA, 2008. ACM.

[104] Frank Y. Li, Arild Kristensen, and Paal Engelstad. Passive and active hidden terminal detection in 802.11-based ad hoc networks. INFOCOM Poster, 2006.

[105] Yunpeng Li, D.J. Crandall, and D.P. Huttenlocher. Landmark classification in large-scale image collections. In Computer Vision, 2009 IEEE 12th Interna- tional Conference on, pages 1957 –1964, 29 2009-oct. 2 2009.

282 [106] Ben Liang and Zygmunt J. Haas. Predictive distance-based mobility manage- ment for multidimensional pcs networks. IEEE/ACM Transactions on Net- working, 11:718–732, October 2003.

[107] Yue-Hsun Lin, Ahren Studer, Hsu-Chin Hsiao, Jonathan M. McCune, King- Hang Wang, Maxwell Krohn, Phen-Lan Lin, Adrian Perrig, Hung-Min Sun, and Bo-Yin Yang. Spate: small-group pki-less authenticated trust establish- ment. In Proceedings of the 7th international conference on Mobile systems, applications, and services, MobiSys ’09, pages 1–14, New York, NY, USA, 2009. ACM.

[108] Jiayang Liu and Lin Zhong. Micro power management of active 802.11 inter- faces. In Proceedings of the 6th international conference on Mobile systems, applications, and services, MobiSys ’08, pages 146–159, New York, NY, USA, 2008. ACM.

[109] Xi Liu, Anmol Sheth, Michael Kaminsky, Konstantina Papagiannaki, Srini- vasan Seshan, and Peter Steenkiste. Dirc: increasing indoor wireless capacity using directional antennas. In Proceedings of the ACM SIGCOMM 2009 con- ference on Data communication, SIGCOMM ’09, pages 171–182, New York, NY, USA, 2009. ACM.

[110] Xin Liu, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, and Hui Zang. Experiences in a 3g network: interplay between the wireless channel and applications. In Proceedings of the 14th ACM international conference on Mobile computing and networking, MobiCom ’08, pages 211–222, New York, NY, USA, 2008. ACM.

[111] Loopt, Inc. Your social compass — loopt. http://www.loopt.com.

[112] David G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2, ICCV ’99, pages 1150–, Washington, DC, USA, 1999. IEEE Com- puter Society.

[113] Ritesh Maheshwari, Shweta Jain, and Samir R. Das. A measurement study of interference modeling and scheduling in low-power wireless networks. In Proceedings of the 6th ACM conference on Embedded network sensor systems, SenSys ’08, pages 141–154, New York, NY, USA, 2008. ACM.

[114] Gregor Maier, Anja Feldmann, Vern Paxson, and Mark Allman. On dominant characteristics of residential broadband internet traffic. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, IMC ’09, pages 90–102, New York, NY, USA, 2009. ACM.

283 [115] Justin Manweiler, Sharad Agarwal, Ming Zhang, Romit Roy Choudhury, and Paramvir Bahl. Switchboard: a matchmaking system for multiplayer mobile games. In Proceedings of the 9th international conference on Mobile systems, applications, and services, MobiSys ’11, pages 71–84, New York, NY, USA, 2011. ACM.

[116] Justin Manweiler, Peter Franklin, and Romit Roy Choudhury. RxIP: Moni- toring the health of home wireless networks. In INFOCOM 2012. 31st IEEE International Conference on Computer Communications. IEEE, 2012.

[117] Justin Manweiler and Romit Jain, Puneet Roy Choudhury. Satellites in our pockets: An object positioning system using smartphones. In Proceedings of the 10th international conference on Mobile systems, applications, and services, MobiSys ’12, New York, NY, USA, 2012. ACM.

[118] Justin Manweiler and Romit Roy Choudhury. Avoiding the rush hours: Wifi energy management via traffic isolation. In Proceedings of the 9th international conference on Mobile systems, applications, and services, MobiSys ’11, pages 253–266, New York, NY, USA, 2011. ACM.

[119] Justin Manweiler and Romit Roy Choudhury. Avoiding the rush hours: Wifi energy management via traffic isolation. Mobile Computing, IEEE Transac- tions on, PP(99):1, 2011.

[120] Justin Manweiler, Naveen Santhapuri, Souvik Sen, Romit Roy Choudhury, Srihari Nelakuditi, and Kamesh Munagala. Order matters: transmission re- ordering in wireless networks. In Proceedings of the 15th annual international conference on Mobile computing and networking, MobiCom ’09, pages 61–72, New York, NY, USA, 2009. ACM.

[121] Justin Manweiler, Naveen Santhapuri, Souvik Sen, Romit Roy Choudhury, Sri- hari Nelakuditi, and Kamesh Munagala. Order matters: Transmission reorder- ing in wireless networks. Networking, IEEE/ACM Transactions on, PP(99):1, 2011.

[122] Justin Manweiler, Ryan Scudellari, Zachary Cancio, and Landon P. Cox. We saw each other on the subway: secure, anonymous proximity-based missed connections. In Proceedings of the 10th workshop on Mobile Computing Systems and Applications, HotMobile ’09, pages 1:1–1:6, New York, NY, USA, 2009. ACM.

[123] Justin Manweiler, Ryan Scudellari, and Landon P. Cox. Smile: encounter-based trust for mobile social services. In Proceedings of the 16th ACM conference on Computer and communications security, CCS ’09, pages 246–255, New York, NY, USA, 2009. ACM.

284 [124] Daniel W. Margo and Margo Seltzer. The case for browser provenance. In First workshop on on Theory and practice of provenance, pages 9:1–9:5, Berkeley, CA, USA, 2009. USENIX Association. [125] Suhas Mathur, Wade Trappe, Narayan Mandayam, Chunxuan Ye, and Alex Reznik. Radio-telepathy: extracting a secret key from an unauthenticated wireless channel. In Proceedings of the 14th ACM international conference on Mobile computing and networking, MobiCom ’08, pages 128–139, New York, NY, USA, 2008. ACM. [126]Megan McCarthy. How Facebook employees break into your profile. Gawker, http://gawker.com/319630/ how-facebook-employees-break-into-your-profile, November 2007. [127] Meru Networks. Revolutionizing wireless LAN deployment economics with the meru networks radio switch. White Paper, 2005. [128] Emiliano Miluzzo, Nicholas D. Lane, Krist´of Fodor, Ronald Peterson, Hong Lu, Mirco Musolesi, Shane B. Eisenman, Xiao Zheng, and Andrew T. Campbell. Sensing meets mobile social networks: the design, implementation and evalua- tion of the cenceme application. In Proceedings of the 6th ACM conference on Embedded network sensor systems, SenSys ’08, pages 337–350, New York, NY, USA, 2008. ACM. [129] D. Minnen, T. Starner, J.A. Ward, P. Lukowicz, and G. Troster. Recognizing and discovering human actions from on-body sensor data. In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pages 1545 –1548, July 2005. [130] K. Mittal and E.M. Belding. Rtss/ctss: mitigation of exposed terminals in static 802.11-based mesh networks. In Wireless Mesh Networks, 2006. WiMesh 2006. 2nd IEEE Workshop on, pages 3–12, September 2006. [131] Prashanth Mohan, Venkata N. Padmanabhan, and Ramachandran Ramjee. Nericell: rich monitoring of road and traffic conditions using mobile smart- phones. In Proceedings of the 6th ACM conference on Embedded network sensor systems, SenSys ’08, pages 323–336, New York, NY, USA, 2008. ACM. [132] U. Moller, L. Cottrell, P. Palfrader, and L. Sassaman. Mixmaster protocol — version 2. IETF Internet Draft, 2003. [133] Monsoon Solutions Inc. Power monitor. http://www.msoon.com/ LabEquipment/PowerMonitor/. [134] S. Moon, I. Moon, and K. Yi. Design, tuning, and evaluation of a full-range adaptive cruise control system with collision avoidance. Control Engineering Practice, 17(4):442–455, 2009.

285 [135] Morgan Stanley Research. The mobile Internet report. http: //www.morganstanley.com/institutional/techresearch/pdfs/mobile_ internet_report.pdf, December 2009. [136] Robert Morris, Eddie Kohler, John Jannotti, and M. Frans Kaashoek. The click modular router. In Proceedings of the seventeenth ACM symposium on Operating systems principles, SOSP ’99, pages 217–231, New York, NY, USA, 1999. ACM. [137] Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer. Provenance-aware storage systems. In Proceedings of the annual con- ference on USENIX ’06 Annual Technical Conference, pages 4–4, Berkeley, CA, USA, 2006. USENIX Association. [138] R. Murty, A. Wolman, J. Padhye, and M. Welsh. An architecture for extensible wireless lans. In HotNets VII, 2008. [139] Rohan Murty, Jitendra Padhye, Ranveer Chandra, Alec Wolman, and Brian Zill. Designing high performance enterprise wi-fi networks. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, NSDI’08, pages 73–88, Berkeley, CA, USA, 2008. USENIX Association. [140] Rohan Murty, Jitendra Padhye, Alec Wolman, and Matt Welsh. Dyson: an architecture for extensible wireless lans. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIX ATC’10, pages 15–15, Berkeley, CA, USA, 2010. USENIX Association. [141] Tamer Nadeem and Lusheng Ji. Location-aware ieee 802.11 for spatial reuse enhancement. IEEE Transactions on Mobile Computing, 6:1171–1184, October 2007. [142] Sergiu Nedevschi, Rabin K. Patra, Sonesh Surana, Sylvia Ratnasamy, Laksh- minarayanan Subramanian, and Eric Brewer. An adaptive, high performance mac for long-distance multihop wireless networks. In Proceedings of the 14th ACM international conference on Mobile computing and networking, MobiCom ’08, pages 259–270, New York, NY, USA, 2008. ACM. [143] Anthony J. Nicholson and Brian D. Noble. Breadcrumbs: forecasting mobile connectivity. In Proceedings of the 14th ACM international conference on Mo- bile computing and networking, MobiCom ’08, pages 46–57, New York, NY, USA, 2008. ACM. [144] Jitendra Padhye, Sharad Agarwal, Venkata N. Padmanabhan, Lili Qiu, Ananth Rao, and Brian Zill. Estimation of link interference in static multi-hop wireless networks. In Proceedings of the 5th ACM SIGCOMM conference on Inter- net Measurement, IMC ’05, pages 28–28, Berkeley, CA, USA, 2005. USENIX Association.

286 [145] K. Papagiannaki, M. Yarvis, and W. S. Conner. Experimental characterization of home wireless networks and design implications. In INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, pages 1–13, April 2006.

[146] Pubudu N. Pathirana, Andrey V. Savkin, and Sanjay Jha. Mobility modelling and trajectory prediction for cellular networks with mobile base stations. In Proceedings of the 4th ACM international symposium on Mobile ad hoc net- working & computing, MobiHoc ’03, pages 213–221, New York, NY, USA, 2003. ACM.

[147] Trevor Pering, Yuvraj Agarwal, Rajesh Gupta, and Roy Want. Coolspots: reducing the power consumption of wireless mobile devices with multiple radio interfaces. In Proceedings of the 4th international conference on Mobile systems, applications and services, MobiSys ’06, pages 220–232, New York, NY, USA, 2006. ACM.

[148] Chuan Qin, Xuan Bao, Romit Roy Choudhury, and Srihari Nelakuditi. Tagsense: a smartphone-based approach to automatic image tagging. In Pro- ceedings of the 9th international conference on Mobile systems, applications, and services, MobiSys ’11, pages 1–14, New York, NY, USA, 2011. ACM.

[149] Ramya Raghavendra, Jitendra Padhye, and Ratul Mahajan. Wi-Fi networks are underutilized. Technical report, Microsoft Research, http://research. microsoft.com/pubs/101852/wifi-networks-are-underutilized.pdf, August 2009.

[150] Ahmad Rahmati and Lin Zhong. Context-for-wireless: context-sensitive energy-efficient wireless data transfer. In Proceedings of the 5th international conference on Mobile systems, applications and services, MobiSys ’07, pages 165–178, New York, NY, USA, 2007. ACM.

[151] Gokul Rajagopalan and Peter Thornycroft. Arm yourself to increase enterprise wlan data capacity. Aruba Whitepaper, http://www.arubanetworks.com/ pdf/technology/whitepapers/wp_ARM_EnterpriseWLAN.pdf, 2009.

[152] S. Ramanathan. A unified framework and algorithm for (t/f/c)dma channel assignment in wireless networks. In INFOCOM ’97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, volume 2, pages 900–907 vol.2, April 1997.

[153] Subramanian Ramanathan and Errol L. Lloyd. Scheduling algorithms for mul- tihop radio networks. IEEE/ACM Transactions on Networking, 1:166–177, April 1993.

287 [154] Marguerite Reardon. AT&T to Invest $2B in Mobile Network. http://articles.cnn.com/2010-01-29/tech/att.network.boost_1_ cell-sites-new-cell-network-upgrades?_s=PM:TECH, January 2010.

[155] Charles Reis, Ratul Mahajan, Maya Rodrig, David Wetherall, and John Zahor- jan. Measurement-based models of delivery and interference in static wireless networks. In Proceedings of the 2006 conference on Applications, technolo- gies, architectures, and protocols for computer communications, SIGCOMM ’06, pages 51–62, New York, NY, USA, 2006. ACM.

[156] Injong Rhee, Ajit Warrier, Mahesh Aia, and Jeongki Min. Z-mac: a hybrid mac for wireless sensor networks. In Proceedings of the 3rd international conference on Embedded networked sensor systems, SenSys ’05, pages 90–101, New York, NY, USA, 2005. ACM.

[157] Thomas Ristenpart, Gabriel Maganis, Arvind Krishnamurthy, and Tadayoshi Kohno. Privacy-preserving location tracking of lost or stolen devices: crypto- graphic techniques and replacing trusted third parties with dhts. In Proceedings of the 17th conference on Security symposium, pages 275–290, Berkeley, CA, USA, 2008. USENIX Association.

[158] Edward Rosten and Tom Drummond. Machine learning for high-speed corner detection. In European Conference on Computer Vision (ECCV), pages 430– 443, 2006.

[159] Eric Rozner, Vishnu Navda, Ramachandran Ramjee, and Shravan Rayanchu. Napman: network-assisted power management for wifi devices. In Proceed- ings of the 8th international conference on Mobile systems, applications, and services, MobiSys ’10, pages 91–106, New York, NY, USA, 2010. ACM.

[160] Naveen Santhapuri, Justin Manweiler, Souvik Sen, Xuan Bao, Romit Roy Choudhury, and Srihari Nelakuditi. Sensor assisted wireless communication. In Local and Metropolitan Area Networks (LANMAN), 2010 17th IEEE Workshop on, pages 1–5, May 2010.

[161] Naveen Santhapuri, Justin Manweiler, Souvik Sen, Romit Roy Choudhury, Srihari Nelakuditi, and Kamesh Munagala. Message in Message (MIM): A Case for Reordering Transmissions in Wireless Networks. In HotNets VII, 2008.

[162] Stefan Saroiu and Alec Wolman. Enabling new mobile applications with loca- tion proofs. In Proceedings of the 10th workshop on Mobile Computing Systems and Applications, HotMobile ’09, pages 3:1–3:6, New York, NY, USA, 2009. ACM.

288 [163] , Paramvir Bahl, Ram´on Caceres, and Nigel Davies. The case for vm-based cloudlets in mobile computing. IEEE Pervasive Com- puting, 8:14–23, October 2009.

[164] Scalable Network Technologies. Qualnet v2.6.1. http://www. scalable-networks.com.

[165] G. Schall, J. Sch¨oning, V. Paelke, and G. Gartner. A survey on augmented maps and environments: Approaches, interactions and applications. Taylor & Francis Group, 2011.

[166] Aaron Schulman, Vishnu Navda, Ramachandran Ramjee, Neil Spring, Pralhad Deshpande, Calvin Grunewald, Kamal Jain, and Venkata N. Padmanabhan. Bartendr: a practical approach to energy-aware cellular data scheduling. In Proceedings of the sixteenth annual international conference on Mobile comput- ing and networking, MobiCom ’10, pages 85–96, New York, NY, USA, 2010. ACM.

[167] A. Sharma, E.M. Belding, and C.E. Perkins. Cell-share: Opportunistic use of cellular uplink to augment rural wifi mesh networks. In Vehicular Technology Conference Fall (VTC 2009-Fall), 2009 IEEE 70th, pages 1–5, September 2009.

[168] Ashish Sharma, Vishnu Navda, Ramachandran Ramjee, Venkata N. Padman- abhan, and Elizabeth M. Belding. Cool-tether: energy efficient on-the-fly wifi hot-spots using mobile phones. In Proceedings of the 5th international con- ference on Emerging networking experiments and technologies, CoNEXT ’09, pages 109–120, New York, NY, USA, 2009. ACM.

[169] Eugene Shih, Paramvir Bahl, and Michael J. Sinclair. Wake on wireless: an event driven energy saving strategy for battery operated devices. In Proceedings of the 8th annual international conference on Mobile computing and network- ing, MobiCom ’02, pages 160–171, New York, NY, USA, 2002. ACM.

[170] Vivek Shrivastava, Nabeel Ahmed, Shravan Rayanchu, Suman Banerjee, Srini- vasan Keshav, Konstantina Papagiannaki, and Arunesh Mishra. Centaur: re- alizing the full potential of centralized wlans through a hybrid data path. In Proceedings of the 15th annual international conference on Mobile computing and networking, MobiCom ’09, pages 297–308, New York, NY, USA, 2009. ACM.

[171] K. Sinkar, A. Jagirdar, T. Korakis, H. Liu, S. Mathur, and S. Panwar. Co- operative recovery in heterogeneous mobile networks. In Sensor, Mesh and Ad Hoc Communications and Networks, 2008. SECON ’08. 5th Annual IEEE Communications Society Conference on, pages 395–403, June 2008.

289 [172] Noah Snavely, Steven M. Seitz, and Richard Szeliski. Photo tourism: exploring photo collections in 3d. In ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, pages 835–846, New York, NY, USA, 2006. ACM. [173] Alex C. Snoeren, Craig Partridge, Luis A. Sanchez, Christine E. Jones, Fab- rice Tchakountio, Stephen T. Kent, and W. Timothy Strayer. Hash-based ip traceback. In Proceedings of the 2001 conference on Applications, technolo- gies, architectures, and protocols for computer communications, SIGCOMM ’01, pages 3–14, New York, NY, USA, 2001. ACM. [174] Soekris Engineering, Inc. Soekris net4521 and net4826-series. http://www. soekris.com/. [175] T. Sohn, K. Li, G. Lee, I. Smith, J. Scott, and W. Griswold. Place-its: A study of location-based reminders on mobile phones. UbiComp 2005: Ubiquitous Computing, pages 903–903, 2005. [176] Jacob Sorber, Nilanjan Banerjee, Mark D. Corner, and Sami Rollins. Tur- ducken: hierarchical power management for mobile devices. In Proceedings of the 3rd international conference on Mobile systems, applications, and services, MobiSys ’05, pages 261–274, New York, NY, USA, 2005. ACM. [177] Jon Stokes. Primal rage: a conversation with Carmack, and a look at id’s latest. Ars Technica, http://arstechnica.com/gaming/news/2010/11/ post-8.ars, November 2010. [178] R. Storn and K. Price. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11(4):341–359, 1997. [179] Louise Story and Brad Stone. Facebook retreats on online tracking. http:// www.nytimes.com/2007/11/30/technology/30face.html, November 2007. The New York Times. [180] Enhua Tan, Lei Guo, Songqing Chen, and Xiaodong Zhang. Psm-throttling: Minimizing energy consumption for bulk data communications in wlans. In Network Protocols, 2007. ICNP 2007. IEEE International Conference on, pages 123–132, October 2007. [181] Wee Lum Tan, Fung Lam, and Wing Cheong Lau. An empirical study on 3g network capacity and performance. In INFOCOM 2007. 26th IEEE Inter- national Conference on Computer Communications. IEEE, pages 1514–1522, May 2007. [182] D. N. C. Tse and S. Hanley. Linear Multiuser Receivers: Effective Interference, Effective Bandwidth and User Capacity. IEEE Trans. Inform. Theory, 45:641– 675, Mar. 1999.

290 [183] Aaron Turner. Tcpreplay Pcap editing and replay tools for *NIX. http: //tcpreplay.synfin.net.

[184] Mythili Vutukuru, Kyle Jamieson, and Hari Balakrishnan. Harnessing ex- posed terminals in wireless networks. In Proceedings of the 5th USENIX Sym- posium on Networked Systems Design and Implementation, NSDI’08, pages 59–72, Berkeley, CA, USA, 2008. USENIX Association.

[185] K. Whitehouse, A. Woo, F. Jiang, J. Polastre, and D. Culler. Exploiting the capture effect for collision detection and recovery. In Proceedings of the 2nd IEEE workshop on Embedded Networked Sensors, pages 45–52, Washington, DC, USA, 2005. IEEE Computer Society.

[186] Marc Whitten. An open letter from Xbox LIVE general man- ager Marc Whitten. http://www.xbox.com/en-US/Press/Archive/2010/ 0205-whittenletter, February 2010.

[187] Tingxin Yan, Vikas Kumar, and Deepak Ganesan. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In Proceedings of the 8th international conference on Mobile systems, applications, and services, MobiSys ’10, pages 77–90, New York, NY, USA, 2010. ACM.

[188] Jie Yang, Simon Sidhom, Gayathri Chandrasekaran, Tam Vu, Hongbo Liu, Nicolae Cecan, Yingying Chen, Marco Gruteser, and Richard P. Martin. De- tecting driver phone use leveraging car speakers. In Proceedings of the 17th annual international conference on Mobile computing and networking, Mobi- Com ’11, pages 97–108, New York, NY, USA, 2011. ACM.

[189] Wei Ye, J. Heidemann, and D. Estrin. An energy-efficient mac protocol for wireless sensor networks. In INFOCOM 2002. Twenty-First Annual Joint Con- ference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 3, pages 1567–1576 vol.3, 2002.

[190] Yan-Tao Zheng, Ming Zhao, Yang Song, H. Adam, U. Buddemeier, A. Bis- sacco, F. Brucher, Tat-Seng Chua, and H. Neven. Tour the world: Building a web-scale landmark recognition engine. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1085–1092, June 2009.

291 Biography

Justin Gregory Manweiler was born September 9, 1985 in Hampton, Virginia. In May 2007, he earned the Bachelor of Science degree in Computer Science from The College of William and Mary in Williamsburg, Virgina. He completed the Doctor of Philosophy degree in Computer Science from Duke University in May 2012 with thesis “Building Blocks for Tomorrow’s Mobile App Store” under the direction of Romit Roy Choudhury. At time of writing, Justin’s dissertation work has resulted in eleven journal, conference, and workshop publications [122, 123, 161, 120, 121, 160, 115, 118, 119, 116, 117]. In 2010, Justin interned at the Networking Research Group at Microsoft Research in Redmond, Washington. His work on improving mobile battery life while using WiFi was nominated for Best Paper at MobiSys 2011 and received press attention, such as from The Wall Street Journal, PBS, Slashdot, and Scientific American. In 2012, Justin joined the IBM T. J. Watson Research Center in Hawthorne, New York as a Research Staff Member.

292