Why it Takes so Long to Connect to a WiFi Access Point

Changhua Pei, Zhi Wang, Youjian Zhao, Zihan Wang, Yuanquan Peng, Wenliang Tang, Yuan Meng, Dan Pei Xiaodong Qu

INFOCOM 2017 2

Measurement Correlation Motivation Modeling Results Analysis 3

Measurement Correlation Motivation Modeling Results Analysis 4 WiFi is indispensable in our daily lives

Source: Cisco VNI Mobile, 2017

97.4 (49% of overall traffic) 3X 5X 454 30.8 94

2005 2020 2006 2020 Overall WiFi Traffic Growth Hotspots (Exabytes) (Millions) 5 Experience of WiFi Network Remote Services Mobile Device AP

Throughput

Downloading 6 Experience of WiFi Network Remote Services Mobile Device AP

WiFi Hop Latency

Online Gaming 7 Experience of WiFi Network Remote Services Mobile Device AP 8 Experience of WiFi Network Mobile Device AP

I want to connect to the AP! Connection set-up process

Now I can talk to Connection time cost the Internet ^^ Urgent need to study the connection set-up time 9

Mobile Device AP Suranga [WiNTECH’13] is the first work focus on WiFi connection time cost : • The connection set-up process in the wild is unknown • Lack thorough investigation in a larger scale. I want to access theWe AP!focus on: Connection set-up procedure • How about the connection time cost in the wild? Now• WhatI canistalkthetoculprit of theConnectionhigh connectiontime costtime cost? the• WhatInternetcan ^^I do to reduce the connection time cost? 10

Measurement Correlation Motivation Modeling Results Analysis 11 DATASET

• WiFi Manager of Tencent Technology • Provide Free WiFi service • Top in the Android/iOS App market (China) • About 50K downloads every day • Continuously collect one week data from May 3 to May 9. 12 DATASET

Connection Log Dataset

l 7 Million unique APs l 5 Million unique mobile devices l 4 different cities. l 0.4 billion overall connection attempts. 13 CDF of the connection time cost

100 80 60 40 CDF (%) CDF 20 0 0 5 10 15 20 25 30 Connection Time Cost (s)

15% (5%) successfulGlobalconnectionsPublic WiFi Hotspotsconsume over 5 (10) seconds! 14 DATASET

Breaking Down Dataset

l 12,472 selected devices l 706K connection attempts l Spread over differentplaces. Connection Time Cost 15 App starts timing

Scan …

Association Association Request 4-way handshake Authentication Request IP DHCP … Obtain IP Connection Time Cost 16 App starts Disconnected timing

Scan …

Association Associating Association Request 4-way Associated 4-way handshake handshake Authentication Request completed IP DHCP … Obtain IP connected 17 WiFi Association: Success vs. Failure 54.9% 60 54.9 50 40 45% 30

20 14.6 9.4 percentage(%) 7.85 10 7.76 5.042 0.377 0.071 0 Success Timeout DHCP Wrong Swtich to Forget WiFi Unkown Failure Password another WiFi Switched WiFi Off

Based on the breaking down dataset. 18

Does there exist one sub-phase which dominates the overall connection set-up process? 100 100

80 80 19 Scan 60 60 40 40 Association CDF (%) CDF 0s~7s (%) CDF 0s~7s 20 7s~15s 20 7s~15s 15s~30s 15s~30s 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Proportion of Scan (%) Proportion of Association (%) 100 100 0s~7s 80 80 7s~15s 15s~30s 60 60 40 CDF (%) CDF 40 DHCP 0s~7s (%) CDF 20 Authentication 7s~15s 20 15s~30s 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Proportion of Authentication (%) Proportion of DHCP (%) 100

80 20 Association and Authentication 60 40 Association do not take too much time. (%) CDF 0s~7s 20 7s~15s 15s~30s 0 0 20 40 60 80 100 Proportion of Association (%) 100

80 60 Fixed number of MAC-layer 40 CDF (%) CDF 0s~7s packets exchange in Association Authentication 20 7s~15s or Authentication. 15s~30s 0 0 20 40 60 80 100 Proportion of Authentication (%) 100

80 21 Scan 60 0s-7s, 7s-15s: DHCP phase 40 occupies more than 80%, which CDF (%) CDF 0s~7s 20 7s~15s is consistent with WiNTECH work. 15s~30s 0 0 20 40 60 80 100 Proportion of Scan (%)

100 0s~7s 80 7s~15s 15s~30s 15-30s: Scan phase 60 40 DHCP consumes more than (%) CDF DHCP phase for those 20 0 > 15s. 0 20 40 60 80 100 Proportion of DHCP (%) 22 Scan Phase dominate the connection

0s~7s 7s~15s 15s~30s 100 0s-7s class: Scan consume 80 less than 3.4s. 60

40 11.6s CDF (%) CDF 20 3.4s 15s-30s class: For more than 0 40% processes, Scan phase 0 5 10 15 20 25 30 consume more than 11.6s. Scan Time Cost (s)

• Why does this happen? 23 Anomalous transitions cause long scanning 290 • Anomalous transition to 2415 578 Scanning Associating Disconnectedstate

1798 3066 1576 • Mobility 1229 669 • WiFi interference 137 • 432 Disconnected Associated System process delay • …

9 352 1661 1309 Connected Completed Total number of transitions in the dataset. 24 Take-away messages: 1. For those connection whose time cost > 15s, Scan is the dominate sub-phase. 2. Scan dominates the whole process because there are anomalous transitions. 25

Measurement Correlation Motivation Modeling Results Analysis 26 Which feature affect the connection time cost the most? 1. Give Intermediate results to gain some intuitions before the ML model. 2. Help feature selection. 27 Introduction of the Connection Log Dataset Aggregated results 28 Visualization analysis of all the APs. 10

9

8

7

6

Connection Time Cost (s) Cost Time Connection 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Hour of Day • Association timing affects the connection time cost. 29 Visualization analysis

6 5 • Connection with higher RSSI 4 tend to have smaller average 3 2 connection time costs. 1 Connection Time Cost (s) Cost Time Connection -95 -90 -85 -80 -75 -70 -65 -60 RSSI (dBm) 30 Visualization analysis

6 • The larger the number of 5 4 associated devices is, the 3 2 higher average connection 1 Connection Time Cost (s) Cost Time Connection 0 40 80 120 160 200 240 time cost. Number of Devices 31 Correlation Analysis v Kendall correlation: (rank correlation)

���������� ����� − ���������� ����� ��� = �(� − 1)/2 v Relative Information Gain: (RIG) how much a factor X helps to predict the final latency Y

()(|) RIG= H(Y)= ∑ � � = � ��� () [] 32 Correlation Analysis

• Mobile devices and AP model has the highest RIG. • HTC on average 1.3x larger than . 100

80 33

Mobile Device Model 60

40 • Chipset matters. (%) CDF 20 M1 Note • Each model contains > 10K pieces of data Note • 0 RSSI > -60 dBm 0 5 10 15 20 25 30 • 500+ devices, 500+ APs, 7 days, 500+ places Connection Time Cost (s) 100 80 34 Mobile Device Model 60

40 CDF (%) CDF • matters. 20 SAMSUNG G9280 MEIZU PRO 5 0 0 5 10 15 20 25 30 Connection Time Cost (s) 35 Correlation Analysis

• Mobile devices and AP model has the • RSSI has large RIG and the highest highest RIG. Kendall. • HTC in average 1.5x larger than Samsung. 36 Correlation Analysis • Number of devices helps little. • Step function of number of devices.

• Mobile devices and AP model has the • RSSI has large RIG and the highest highest RIG. Kendall. • HTC on average 1.5x larger than Samsung. 37 Visualization analysis

6 • The larger the number of 5 4 associated devices is, the 3 2 higher average connection 1 Connection Time Cost (s) Cost Time Connection 0 40 80 120 160 200 240 time cost. Number of Devices 38 AP Model • Private APs: APs which provide private WiFi services for a relatively small number of users.

• Public APs: APs which provide public/open WiFi services.

• Manually label 200K APs. 39

Measurement Correlation Motivation Modeling Results Analysis 40 What can I do to reduce the connection time cost?

Machine Learning Based Model Enhanced AP Selection Feature selection HOW? Algorithm Model selection

What-If Analysis 41 Machine Learning based Model

• Labeling • Use 15 seconds as the threshold to divide the process into SLOW and FAST. • Model Selection • Highest accuracy: Random Forest. • Online Learning • Prediction speed. 42 Machine Learning based Model

• Feature selection • All the features should be easily measured by mobile devices • Use as few features as possible under acceptable accuracy Strongest Signal 43 Strength Algorithm -20dBm

-22dBm -40dBm … Enhanced AP 44 Selection Algorithm

(12am, IPhone, Cisco, -20dBm, Yes) ML (12am, IPhone, TP-Link,-40dBm, Yes) (12am, IPhone, Hiwifi,-60dBm, No)

… Enhanced AP 45 Selection Algorithm

SLOW ML SLOW FAST

… 46 Evaluation

• Let the two algorithms work with the same dataset.

• Compare the time cost of the APs selected by different algorithms. (The cost is already known when certain device connects to certain AP in the dataset.)

Q: One device did not connect to all the neighbor APs! A: We use the deviceGlobal PublicwhoseWiFi Hotspots60 features are the same to approximate the connection time cost to each other! 3.6% 47 Evaluation 3 seconds 48 Conclusions • WiFi connection set-up time cost is important but few works focus on it. • Exhaustive real world measurement from a popular mobile WiFi manager App. 45% of the WiFi connection attempts fail. • Using customized code to break down the whole process into different sub-phases for the first time. • We propose a machine learning based AP selection algorithm to help users connect AP which shows great performance gain. 49 References

• [1 ]A. Patro, S. Govindan, and S. Banerjee. Observing home wireless experience through wifi aps. In MobiCom, pages 339–350. ACM, 2013. • [2] S. Grover, M. S. Park, S. Sundaresan, S. Burnett, H. Kim, B. Ravi, and N. Feamster. Peeking behind the nat: an empirical study of home networks. In IMC, pages 377–390. ACM, 2013. • [3] S. Sundaresan, W. De Donato, N. Feamster, R. Teixeira, S. Crawford, and A. Pescape`. Broadband internet performance: a view from the gateway. SIGCOMM, 41(4):134–145, 2011. • [4] S.Sundaresan,N.Feamster,andR.Teixeira. Measuring the performance of user traffic in home wireless networks. In PAM, 2015. • [5] Suranga Seneviratne, Aruna Seneviratne, Prasant Mohapatra, and Pierre Ugo Tournoux. Characterizing wifi connection and its impact on mobile users: practical insights. In WiNTECH, pages 81–88. ACM, 2013. 50

INFOCOM Thank you! Q&A?

2017 [email protected]