<<

Characterization of Multi-User over Cellular Networks

Kittipat Apicharttrisorn∗, Bharath Balasubramanian†, Jiasi Chen∗, Rajarajan Sivaraj‡, Yi-Zhen Tsai∗, Rittwik Jana†, Srikanth Krishnamurthy∗, Tuyen Tran†, and Yu Zhou† ∗University of California, Riverside, California, USA {kapic001 | ytsai036}@ucr.edu, {jiasi | krish}@cs.ucr.edu †AT&T Labs Research, Bedminster, New Jersey, USA {bb536c | rj2124}@att.com, {tuyen | yuzhou}@research.att.com ‡AT&T Labs Research, San Ramon, California, USA [email protected]

Abstract—Augmented reality (AR) apps where multiple users For example, Fig. 1 shows the End-to-End interact within the same physical space are gaining in popularity CDF of the AR latencies of the Aggregate RAN (e.g., shared AR mode in Pokemon Go, virtual graffiti in Google CloudAnchor AR app [3], 1.0 Google’s Just a Line). However, multi-user AR apps running running on a 4G LTE production over the cellular network can experience very high end-to- 0.5 end latencies (measured at 12.5 s median on a public LTE network of a Tier-I US cellular CDF network). To characterize and understand the root causes of carrier at different locations and 0 this problem, we perform a first-of-its-kind measurement study times of day (details in §IV). The 0 10 20 30 on both public LTE and industry LTE testbed for two popular results show a median 3.9 s and multi-user AR applications, yielding several insights: (1) The Latency (s) radio access network (RAN) accounts for a significant fraction 12.5 s of aggregate radio access Fig. 1. Multi-User AR latency of the end-to-end latency (31.2%, or 3.9 s median), resulting in network (RAN) latency and end-to-end AR latency (as ex- AR users experiencing high, variable delays when interacting plained below), respectively. As latency is a key contributor with a common set of virtual objects in off-the-shelf AR apps; to AR quality-of-experience (QoE) [4], a deeper understanding (2) AR network traffic is characterized by large intermittent of the root causes of these high latencies is needed in order to spikes on a single uplink TCP connection, resulting in frequent TCP slow starts that can increase user-perceived latency; (3) improve AR’s end-to-end performance over cellular networks, Applying a common traffic management mechanism of cellular which is the key focus of our paper. operators, QoS Class Identifiers (QCI), can help by reducing In this context, end-to-end AR latency is the total time from AR latency by 33% but impacts non-AR users. Based on these when one User Equipment (UE) places a virtual object in the insights, we propose network-aware and network-agnostic AR real world until when another UE can view the object on her design optimization solutions to intelligently adapt IP packet sizes and periodically provide information on uplink data availability, screen. Aggregate RAN latency is defined as the subset of this respectively. Our solutions help ramp up network performance, time for over-the-air transmissions. improving the end-to-end AR latency and goodput by ∼40-70%. Why is AR different?: While the cellular network is relatively Index Terms—Augmented reality, Mobile communication, well equipped to handle traditional applications such as web Cross layer design, Radio access networks and video, AR presents new challenges because of its unique application and communication characteristics, as discovered I.INTRODUCTION in this paper. In brief, AR differs from other multimedia appli- cations such as video or 360◦ (VR) streaming, Augmented reality (AR), with its premise of virtual ob- short-form video uploads (e.g., Snapchat or Instagram Stories) jects integrated with our physical environment, promises new and video conferencing in the following key ways: immersive experiences, and the market is forecast to reach • Lack of playback buffers and latency-sensitivity: In video 100 billion dollars by 2021 [1], [2]. AR applications such as and 360◦ VR streaming, the length of the playback buffer, navigation, entertainment (e.g., Pokemon Go), and field service which caches the yet-to-be-played video chunks for the involve multiple users, co-located in the same shared outdoor player, impacts streaming patterns and traffic burst periods. environments, relying on low latency communication over the We observe that off-the-shelf AR apps do not continuously cellular network to obtain a consistent view of virtual objects. upload data, and they consume AR data holistically, unlike In this paper, we consider scenarios where multiple users are in video/VR, where the player consumes data frame-by- co-located in the same real physical space, and wish to view frame. Hence, AR data objects need to be delivered quickly a common set of virtual objects. Our measurements of such to the AR device (user equipment, or UE), in order to off-the-shelf AR apps over cellular networks show that the avoid latency-based QoE issues. While video conferenc- user-perceived end-to-end AR latencies are extremely high. ing similarly lacks playback buffers and has tight latency 978-1-7281-6630-8/20/$31.00 2020 c IEEE deadlines, delayed frames can be skipped or played back quickly later, whereas delayed AR transmissions can lead to multimedia applications, we compare the user-perceived inconsistent user manipulations of the virtual objects (e.g., latencies across these apps in §IV-A. Then, we provide a user A touches a non-existent virtual object that has already component-wise breakdown of the end-to-end (E2E) AR been moved by user B). latency and show that the RAN accounts for ∼31.2% of • Lack of application adaptation mechanisms: Video and the overall latency on average over public LTE (∼3.9s). 360◦ VR streaming make use of adaptive bit rate (ABR) This causes the AR devices to experience high delays when mechanisms, such as MPEG-DASH, which adapt the interacting with the virtual objects (§IV-B). We also char- streaming resolutions of the video chunks to avoid video acterize the performance of RAN optimization techniques QoE issues. However, off-the-shelf AR currently does not such as QoS Class Index (QCI) adaptation to serve AR use ABR, and cannot be modified to do so in closed source traffic. While QCI adaptation improves the E2E latency for commercial AR systems. Additionally, even if such systems AR users by ∼33%, it reduces the throughput of non-AR were transparent, it is unclear how such application-layer users by ∼31.6% (§IV-D). adaptations should be done in AR. (discussed in §V-C) • We show that AR traffic is bursty with large time gaps be- • Uplink-heavy TCP traffic: In multi-user gaming, the traffic tween successive bursts of uplink data (e.g., 20s on average is comprised of small uplink UDP [5] data (mainly from for CloudAnchor with burst sizes of ∼2.5 MB) whenever user movements/actions) with stringent latency require- a virtual object is placed. This causes the TCP congestion ments. For short-form video uploads (such as Instagram window (cwnd) to enter slow-start before the beginning Stories or Snapchat), live video upload or conferencing, of each burst. We show that RAN segmentation latency even though the traffic is uplink-heavy, latency tolerance at the RLC layer significantly impacts TCP performance, is higher than AR [4]. Live or buffered video streaming especially during the slow-start phase (§IV-E). apps such as YouTube use downlink QUIC [6] instead of • We propose a network-aware AR app design optimiza- TCP, and are more latency-tolerant than AR. In contrast, we tion technique that intelligently adapts IP packet sizes for observe that AR network traffic is different from all these, the AR app, based on the underlying RAN conditions. since it is uplink-heavy, TCP-based and latency-sensitive. Our methodology in selecting packet sizes addresses the Hence, TCP performance for the AR session is critical to trade-off between minimizing segmentation of packets at its end-to-end latency and throughput, and we investigate the RAN (caused by larger IP packets), which adversely its interactions with the RAN and AR in this work. impacts network latency, and minimizing network overhead Contributions: Motivated by these unique characteristics of (caused by smaller IP packets), which adversely impacts multi-user AR, we perform the first detailed experimental application goodput. Our technique improves the aggregate study across the application, IP, and RAN layers to char- RAN latency, end-to-end AR latency, network throughput acterize how the cellular network impacts AR applications. and application goodput by ∼40-70% (§V-A). We provide crucial insights on cross-layer inter-dependencies • We propose a network-agnostic AR app design optimiza- involved in multi-user AR streaming. Existing work on AR tion technique that periodically updates the LTE base station either focuses on real object detection with cloud/edge sup- (eNB) of the uplink RAN buffer status of the hosting AR port [7], [8], [1] over WiFi, or efficient device localization [9] device, even during gaps between AR bursts. We achieve while neglecting the communication aspects. Works involving this by generating negligible, periodic amounts of dummy multiple AR devices [10], [11], [12] focus on application layer data, which enables constant UE buffer status updates to the performance, without quantifying the interactions between AR eNB. This results in improved RAN resource allocation for application, the cellular network, and cloud processing. the AR device and minimizes uplink signaling latencies. Our technique improves the aggregate RAN latency by Our main contribution in this paper is a measurement-driven ∼50%, at a marginal cost of additional bandwidth (§V-B). characterization of multi-user AR on both (i) public 4G LTE cellular carriers of a Tier-I US mobile network operator to Since existing AR apps are closed-source and their data capture realistic network, RF, and traffic conditions, and (ii) transmissions being opaque, our work focuses on network- an experimental LTE industry testbed with fully-implemented layer characterizations and solutions. However, we provide a RAN protocol stack and virtual EPC that allows us to vary brief discussion on how one can adapt the AR application network settings (such as cell load, radio bearer QoS class, content to reduce latency, potentially with other user-perceived etc.) for controlled testing, in order to quantify their impact on performance costs that may be acceptable (§V-C). AR performance. We study widely-used AR apps in the market We release our RAN latency analysis tool, which runs on utilizing an off-the-shelf AR platform, Google ARCore [13]. client devices (UEs), as open source [17]. It can be used This platform is broadly representative since it provides multi- by researchers with data captured on public LTE networks, user AR capabilities in Android devices and analogous APIs without any modifications needed to the eNB. are provided by Apple ARKit [14], Microsoft Hololens [15], II.ARBACKGROUNDAND RELATED WORK and [16] (see §III for methodology details). Background on AR: When current AR devices (e.g., those Our measurement study leads to several insights: running the Apple ARKit, Google ARCore, or Microsoft • To quantify differences between multi-user AR and other Hololens AR platforms) wish to place virtual objects in the Fig. 2. Cloud-based multi-user AR. Use of the cloud is mandated for multi-user AR apps in Android. real world, they first perform device localization in order to user SLAM in the robotics context [23], [24]. These works create a consistent 3D coordinate system of the real world. The mainly focus on the SLAM algorithms themselves, and not real-world coordinate system (called the world frame) provides their communication aspects. a common reference for the devices to place the virtual QoS for cellular networks: Work on service-level QoS objects, and is constructed using Simultaneous Localization for the cellular network allocates physical resource blocks and Mapping (SLAM) techniques [18]. Once the devices have (PRBs) for users through smart eNB schedulers, QCI selection, a common world frame and know the poses (location and or combinations thereof [25], [26], [27]. However, naively orientation) of the virtual objects, the virtual objects can applying these techniques may not work well for AR’s bursty be drawn on each device’s display when it is within the traffic patterns (§IV-D). user’s field-of-view. Below we describe the steps involved in synchronizing the world frame between device A (which III.METHODOLOGY,TESTBEDS, AND TOOLS places a virtual object), and device B (which receives and renders that virtual object), as illustrated in Fig. 2. Multi-User AR apps: We investigate multi-user AR with the Cloud Anchor [3] and Just a Line [28] demo apps provided 1) Hosting (device A): (a) Handshakes: Device A initiates by Google. Cloud Anchor allows one user to place a virtual connections with the cloud: a Firebase database, and object in the scene and a second user to view it. Just a Line two Google Cloud instances for visual positioning. (b) allows two users to draw virtual graffiti in a shared physical Visual Data Tx: Device A sends real-world visual space. These apps all rely on Google’s CloudAnchor API [29], data and information about the virtual objects (position, which is a key API to provide multi-user capabilities for orientation, 3D sprite/texture maps) to the cloud. (c) Android AR apps. Thus, our observations across different apps Cloud Process: The cloud processes A’s visual data are corroborated as they all rely on this fundamental API. The using SLAM to compute the world frame. experiments in this paper are done using Cloud Anchor, except 2) Resolving (device B): (a) Data Preprocess: Device B where it is specified otherwise. scans and retrieves camera frames and pre-processes the Experimental setup: (a) Industry LTE testbed: We use an data. (b) Visual Data Tx: Device B sends its visual data in-house outdoor 10MHz LTE testbed (operating on Band 30, to the cloud. (c) Cloud Process: The cloud matches B’s 2.3 GHz) with a virtual EPC core and an LTE eNB, having visual data against the world frame computed in step 2 LTE cells with 2×2 MIMO capability. Each cell yields 1c, and computes B’s location and orientation in the peak uplink and downlink rates of 25 Mbps and 50 Mbps, world frame. (d) Local Render: B uses the information respectively. The AR UE pair (hosting UE, rendering UE) from the cloud to render the virtual object at the correct and the load phones are connected to the eNB. We use a position and orientation on the display. pair of OnePlus 5T phones as AR test devices and Samsung Related Work: To the best of our knowledge, we are Galaxy S7 phones as load UEs to provide background traffic. the first to perform an in-depth measurement study of AR The phones can support 2x2 MIMO. We varied the RF applications operating on cellular networks. conditions of the AR UEs resulting in uplink SINR values Mobile AR: Many works study object detection for single- ranging from 5 − 17 dB and RSRP values ranging from - user AR [1], [8], [19], [20], [21], [22], with cloud/edge 85 dBm to -105 dBm. The eNBs are running on HP380 processing to reduce latency and/or energy. A few papers [11], servers with WindRiver Linux and transmission power of 26 [12], [10] discuss multi-user AR with focus on the application- dBm using 2T2R antennas. The eNBs are connected to the layer sharing. In this work, we focus on SLAM-based AR, vEPC core, and further to the public IP network. The testbed prevalent in off-the-shelf AR systems, and the communication also allows controlling the user and traffic load on the cells, aspects of multi-user AR when operating on cellular networks. and modifying parameters like QoS Class Identifier (QCI) for Multi-user SLAM: Some work has been done on multi- service differentiation of traffic classes (see Sec. IV-D). Web Browsing: Video Streaming: Instagram Live: Multi-User AR: Host, Resolve

Start FCP FMP OLE Start FFR >10s VBL Start Host Live! FFR Start Hosted! Resolved ! 0.0s 2.8s 3.8s 8.6s 0.0s 1.6s 5.4s 0.0s 3.1s 6.5s 0.0s 13.6s 15.1s Fig. 3. Application screenshots, red-boxed screens are user-perceived events (FCP: First Contentful Paint, FMP: First Meaningful Paint, OLE: onLoad Event, FFR: First Frame Rendered, VBL: Video Buffer Length) cellular networks, do not suffer from similar QoE problems. Web Browsing First Contentful Paint For this, we conduct a set of experiments where a user First Meaningful Paint Video Streaming onLoad Event Instagram Live surfs www.cnn.com, streams an MPEG-DASH video, hosts First Frame Rendered Multi-User AR a live video stream on Instagram, and plays multi-user AR >10s Video Buffer Length on public LTE (all experiments were conducted in sequence Host is now live! First Frame Rendered at Viewer within a one-hour duration). Fig. 3 shows screenshots from Host Success the perspective of the user, and Fig. 4 shows the latency Host+Resolve Success of each application-level event. For web browsing, although 0 5 10 15 the complete content is only loaded (onLoad event) 8.6 s Latency (s) after the user starts browsing , the first contentful paint and Fig. 4. User-perceived latency first meaningful paint happen at 2.8 s and 3.8 s respectively. Similarly, for on-demand video, the user sees the first frame (b) Public LTE network: We perform experiments over a rendered on the screen after 1.6 s, due to video rate adaptation public 20 MHz LTE network with 2× 2 MIMO and Carrier by MPEG-DASH. On-demand video also has playback buffers Aggregation capability on a Tier-I US carrier. We use a pair 1 for pre-fetching video chunks and so is not latency-critical, of Google Pixel 2 phones as AR test devices . We perform as AR is. Similar video rate adaptation mechanisms apply to experiments on public LTE measurements with RSRP and Instagram Live. The stream goes live within 3.1s (as notified SINR values of the AR UEs ranging from -85 to -110 dBm and by the server), and a viewer can view the first frame 3.4s later 5−17 dB, respectively. We perform measurements in different (6.5s after the host started the stream). locations: a campus cafeteria, a public mall, downtown and In summary, even though it takes 5-8s to download the full residential areas, and commercial business, with 5-12 trials web or video content, user experience is not impacted because per location during daytime or nighttime. web and video have application-layer adaptation mechanisms Measurement Tools: 1) Application layer: We instrument (e.g., paint the visible part of the webpage as soon as possible, the AR apps to synchronize and log Unix timestamps of adapt the video quality to have a low time-to-first render, or application events. For web and video applications, we profile cache frames in the video buffer). In contrast, in multi-user the latencies on Android devices using Chrome’s remote AR, which lacks application-layer adaptation mechanisms, the debugging developer tools [30]. 2) TCP/IP layer: We use resolving user can only see the content after the hosting user tcpdump to capture IP packets with timestamps. 3) RAN finishes its entire data transmission, which takes ∼15s. layer: We measure the RAN latency by running MobileIn- sight [31] (MI) to capture LTE PDUs on the test UEs. We B. Breakdown of end-to-end latency develop a custom analyzer [17] to parse the MI logs, extract Latency breakdown: Having shown the detrimental impact PDU-level information and compute RAN latency. of latency on AR QoE, we seek to understand the key contrib- IV. MEASUREMENTSOF AROVER CELLULAR NETWORKS utors to the high end-to-end latencies observed in Sec. IV-A. A. Application-layer performance We plot the constituent components of the latency in Fig. 5a, descriptions of which are provided in Sec. III. On our industry AR streaming vs other applications: Latency is a key LTE testbed, we see that the key latency contributors are on metric for multi-user AR. If the users experience disparate the hosting side (device A): the handshakes, the visual data latencies, consistency issues may result, such as one user transmission, and the cloud processing all together consume placing a virtual object and another user not being able to 10s on average (86.6% of the total end-to-end latency), while view it quickly, or one user attempting to manipulate a virtual the resolving side (device B) includes data preprocessing, object that has already been moved by another user. visual data transmission, cloud processing, and local rendering, From Fig. 1, we have shown that the end-to-end latency and are relatively quick (12.9% of end-to-end latency). On has a very high median value of 12.5 s, resulting in poor QoE public LTE, the hosting steps takes 93.3% of the end-to-end for the AR user [4]. Here, we discuss why other applications, latency because of its long visual data transmission latency. such as web, on-demand video and live streaming uploads over The visual data transmission is the largest contributor to latency on public LTE (10.1s on average, or 48.7% of the end- 1Different UE device models do not make a significant difference in the experiments because AR cloud servers perform the heavy computations and to-end latency), but consumes less time on the LTE testbed AR UEs only send data to the cloud and render virtual objects. (17.2% of the end-to-end latency) due to lack of contention 10 Host (Private LTE Testbed) TCP/IP 15 Aggregate 10 Resolve (Private LTE Testbed) 10 RAN Host (Public LTE) 9.2 5 Resolve (Public LTE) 5 5 1.6 1.6 2.0

Aggregate 0.9 Latency (s) Latency

Latency (s) Latency 0 0 (s) RAN Latency 0 1 AR Pair

Host Host 1 AR Pair + N Resolve Resolve 1a. Handshakes (on public LTE) 1c. Cloud Process 2c. Cloud 2d.Process Local Render 1 AR Pair + 1 ARdevice Pair + 1 ARdevice Pair + 2 device 1b. Visual Data Tx2a. Data Preprocess2b. Visual Data Tx Private LTE Public LTE w/ 25 Mbps w/Upload 12 Mbps w/Upload 12 Mbps Upload (a) Latency breakdown (b) RAN latency (c) Impact of other users Fig. 5. Breakdown of end-to-end latency. Visual data transmission and RAN latencies are significant. Mall Residential RAN latencies at these locations range from 1-10s and seem 10 Downtown Campus (s) 5 Commercial to be correlated with the RSRP. The devices with poor signal

Aggregate 0 1.0 strength tend to have fewer uplink resources allocated, as RAN Latency 0.5 shown in Fig. 6b. Hence our wireless link optimizations of CDF Sec. IV-D, particularly the QCI-based adaptations that impact Mall (-82.57) 0 Location 0 50 100 Campus (-107.17) uplink resources, can potentially provide the most gains for Downtown (-92.90) (Avg. RSRP) Number of PRBs per PDU CommercialResidential (-105.73) (-105.03) devices with poor signal quality. (a) Aggregate RAN latencies at five lo- (b) Uplink physical resources cations Finally, we examine the impact of background users on AR performance, and the impact of AR applications on the Fig. 6. CloudAnchor measurements at five different locations. background users. In our industry LTE testbed, we set up with other users (we observe similar latency on low congestion background devices uploading iPerf3 UDP traffic with finite public LTE at a mall in daytime/weekday shown in Fig. 6a). send buffer (12 or 25 Mbps, representing 50% or 100% of the The high transmission latencies exceed previously observed maximum uplink RAN capacity, respectively), and plot the communication delays on AR research prototypes [7]; we results in Fig. 5c. We see one background user cause a 65.5% hypothesize that this is due to information from multiple increase in RAN latency for the AR user and two background frames being uploaded, as well as additional image features users cause a 111.3% increase. On the public LTE where the such as point cloud data (full details are unknown because the number of background users and their traffic are uncontrolled off-the-shelf AR systems that we test on are closed source). and unknown, the RAN latency for the AR user increases Wireless latency matters: Since we observe above that ∼ 10× possibly due to high cellular network congestion. the uplink visual data transmission latency by the hosting Resolve latency: While in the majority of cases, the visual device is significant, we further decompose the visual data data transmission by the hosting device (step 1b, §II) con- transmission time into TCP/IP and radio access network tributes greatly to the end-to-end latency, in a few cases, we (RAN) components. This provides an understanding of how actually observed that the virtual object resolving process can much time is spent on the wireless link, and how much time cause high delay. This is despite the small amounts of data be- is spent in the wired backbone. The aggregate TCP/IP latency ing uploaded by the resolving device (step 2b, §II). We observe across IP packets is measured as the elapsed time from the that this happens when the user tries to place virtual objects first visual data packet transmission at the TCP layer to the in real-world environments lacking visual features (e.g., high- reception of the last packet’s ACK (step 1b in Sec. III). The contrast edges, colors, etc.). We experimented with several aggregate RAN latency across IP packets is the total time real-world environments ranging from simple to complex, as from when the first visual data packet is received by the RAN shown in Fig. 7b. The RSSI remains relatively constant at -66 layer (i.e. LTE’s PDCP), until when the last packet is sent dBm. We measured the data size, uplink RAN latency, and to the MAC layer for transmission. It is subsumed within the resolve latency and plot the results in Fig. 7a. In the simple aggregate TCP/IP latency. The results are shown in Fig. 5b. grid and floor environments which lack visual features, we We observe that the RAN contributes the majority of the visual observed relatively less data uploaded by host device A (2.2- data transmission time (71.7% of the visual data transmission 2.34 MB on average) and thus lower uplink RAN latency time on LTE testbed, and 98.5% on public LTE), suggesting (0.88-0.89 s on average). However, these simpler environments that the wireless link is a key contributor to end-to-end latency, also caused high resolve latency, as shown in Fig. 7a. This especially on public LTE where the data transmission on the is due to multiple rounds of communication between the RAN takes an average of 10.1 s. Hence further optimizations device B and the cloud, unlike the typical scenario of 1-2 of the wireless link are needed, as discussed in §IV-D and §V. rounds of communication we had observed in the non-grid In addition to the above results on campus using public environments. We hypothesize that the lack of visual features LTE, we repeat the measurements at four other locations with in the grid environment causes difficulties in the world frame different wireless signal strengths (RSRP), as shown in Fig. 6a. construction (step 1c, §II) , resulting in these multiple rounds nteCodnhrapi i.8,adsalrbrt near bursts smaller and 8a, Fig. visual large in the app t both CloudAnchor shows the 9 9 in Fig. Fig. near app. in spikes Line trace data a throughput Just the the by from exemplified as data, action above. discussed latencies B end-to-end device high by the or to downlink while ( average), the on on negligible MB generated is (2.5 data virtual A of a device amount happen host the place transmissions from data to uplink the the screen of on A’s most that device observe bursts We touches transmission object. brevity). user data for large the example to when one app correspond sample show spikes CloudAnchor a we large the The similar; shows running are A 8a traces device (other Fig. of graffiti). trace ( throughput their virtual interactions examine drawing we user objects, occur, AR transmissions with data relationship visual the often characteristics 2c). traffic (step additional AR processing uploads C. cloud B in device aid to as data 2b) visual (step after spikes communication data graffiti. of smaller virtual exhibit drawing also users Line, to A corresponding Just ones, app, large initial AR the Another 9. Fig. the resolve to attempts multiple requires but latency, RAN and sizes data smallest the in results environment grid a in objects augmented object. Hosting virtual 7. Fig. 230 = eas bev eod mle yeo Rue inter- user AR of type smaller second, a observe also We traffic: AR uplink Bursty Throughput (Mbps) KB/ms i.8 Rap xii ag,upeitbedt pkso h aeTPcneto,wihrslsi C lwsatr-rgeigec spike. each re-triggering start slow TCP in results which connection, TCP same the on spikes data unpredictable large, exhibit apps AR 8. Fig. 50 10 20 10 20 0 0 0 100 object hostings Spikes fornew 0 , Data Size (MB) 295 Phone B Phone A 2.0 2.2 2.4 2.6 Visual data upload , a hogptoe time. over Throughput (a) < 340 Grid

150 ChessboardFloor 0 B.Teelre aaszscontribute sizes data larger These KB). 100 t ,ec hs mle usscrepn to correspond bursts smaller These etc. s, 50 170 = a aasz,hs A aec,rslelatency resolve latency, RAN host size, Data (a) Desk Time (s) 200 , 200 protocol handshakes Local tracking and Aggregate 100 Time (s) users startusers interactions After successful pairing, oudrtn h n how and why understand To RAN Latency (s) ,smlrt hs observed those to similar s, 250 0.8 0.9 1.0 1.1 Lower throughput userinteraction data

Grid

ChessboardFloor e.g., Downlink 300 150

lcn virtual placing Desk 350 Uplink

Bytes Resolve Latency (s) b ye nFih ffis he ussi 8a. in bursts three first of Flight in Bytes (b) 1×10 2×10 3×10 10 0 5 400

Grid 0 6 6 6

20 ChessboardFloor rvdr 2]t fe ifrnitdQSfrsrie,where services, for QoS differentiated offer to [25] providers AR? help classes QoS dedicated Can start. D. slow to dropping repeatedly without with window experiments our in Live this observed Instagram (we upload to video ready congestion application-layer data as have such TCP continuously applications streaming hand, the live the other video between the because in On grow idle, is interactions. to user’s is This AR connection has the visual flight phase. when in has decreases start bytes window application slow TCP of AR a time number the each in the that (when send), observe to We happens corresponds data 8a. spike which corre- Fig. flight 8b, data in in spikes Fig. a three bytes in first TCP spike the of second to number the window. the same receive to plot same the TCP sponding we the in with 8c, by affected happen Fig. interaction are In spikes its so data and is stream, the traffic TCP All AR control. of congestion nature bursty the be spikes can content. data which application interactions, between the user visual on delay of depending initial unpredictable, frequency Time place the the uploaded. on to after user depends been wishes objects to has and virtual correspond location data spikes the new smaller with the a interactions while initialized to is objects, moves app virtual user the whenever about the data necessary visual or is to which correspond [3], scene, spikes ARCore larger the interac- of the user that understanding posit of our we types on different Based to tions/scenarios. corresponding data, small average). on initial bytes the than (1430 smaller spikes are data and bytes average), visual on (447 length graffiti packet virtual IP drawing of screen the touching user the o ls dnies(C)aewdl sdb network by used widely are (QCI) Identifiers Class QoS TCP: with Interaction and large both of bursts has traffic AR summary, In

Desk 40 b elwrdevrnet:gi,flo,chessboard, floor, grid, desk. environments: Real-world (b) Time (s) TCP Receive Window TCP BytesinFlight ,adcncniuul rwtecongestion the grow continuously can and ), 60 n mlcto eosrefrom observe we implication One 80 c eodbrtzo n8b. in zoom burst Second (c)

TCP Bytes in Flight 2×10 4×10 0 5 5 47 Time (s) 48 n,adfial eoeteddctdbae.Ti a prevent can This to bearer. about dedicated this is the assign spike remove the quickly finally when begin, and estimate to end, bearer, about dedicated a is to spike flow predict data to AR able be an to when should RAN The class. QCI “AR-specific” nature. unpredictable and network bursty wastes its flow to AR due the resources resources, a network utilize to predictably assigned and assigned the continuously service would video because stream bearer live-streaming transmitting, GBR GBR a the not While for [25]. is resources 13 user wireless of user reserves average AR permanently QCI-4, an eNB the to to the reduced when data assigned throughput even user’s is its flow AR Mbps, has AR the user the iPerf to bandwidth when the available due contrast, QCI- the dips In of to occasional bursts. most assigned with obtain is Mbps), can flow (20 user AR iPerf the the the by When achieved 9, user. performance the iPerf shows non-AR 10c Fig. latency, across AR and 10b, Fig. in of shown majority contiguously. as the TTIs cell, on consecutive of the scheduled in priority be to PRBs higher flow available the AR because the resource user, allows to QCI-4 AR due the is if improvement for even by The prioritization improvement latency present. bound were TCP/IP upper users the more an represents reduce which helps 33%, QCI-4 QCI-4. without limitations. that any ensure 25 without to prioritization of order receive GBR users in high AR bandwidth), very the available a total have (the to rate best-effort Mbps QCI-4 default, bit the configure guaranteed on We a remains QCI-9. user with other QCI-4 the one use while and (GBR), to pair configured AR are one value users Mbps up QCI 25 allowing set uploading we specific testbed, user classes, LTE a background QCI industry with the our over bearer In control a transmission. to data assigned for interactions. is user between service in a transmitting not is other of user throughput AR the guaranteed decreases the a but latency, if with its even QCI-4 reduces Assigning user users, AR bearer: the dedicated to QoS rate of bit Impact 10. Fig.

hs eut ugs eea hlegsi einn an designing in challenges several suggest results These reduce help can QCI-4 as such bearer dedicated a While and with user AR the of latency the shows 10a Fig. Latency (s)

Throughput (Mbps) 0.5 1.0 1.5 2.0 10 20 0 using QCI 9 using QCI AR PairAR 0 AR transmitsAR visual data Short-termed impact when a PLatency IP (a) When Pair9 AR onQCI c matt o-Rusers non-AR to Impact (c) using QCI 4 using QCI 50 AR PairAR Time (s) 100

When Pair4 AR onQCI CDF iperf3 0.5 1.0 0 QCI 4 GBRs are 4GBRs sustainedQCI Long-termed impact when 10 150 AR Pair4 AR usingQCI Pair9 AR usingQCI b PRBs (b) Number ofRBs rfc h AR The traffic. 20 30 200 40 50 ubro akt o h aebrtadunder-utilization and burst same the higher issue, a for this of packets generation address of to help due number can overhead network packets session. increase IP AR they smaller the of using performance While end-to-end adversely TCP the the increase, deteriorating of RTT RLC per-packet growth TCP the the conditions result, the impacting a RF subsequently, As UE’s 11. and Fig. the latency in when shown as and/or when poor, congested scenarios are significantly in is are happens RAN This sizes sizes. the PDU packet the RLC than RLC the smaller the when at especially segmentation heavier layer, experience could it mission, Adaptation Size Packet Optimization: Network-Aware A. in characterization of below. latency solutions the potential improve discuss important and We an throughput applications. is AR the in latency increase RLC discussed to RTT, that factor as TCP suggests latency, throughput interactions, due RLC IP phases user between AR and relationship start between in The slow IV-C. gap crucial frequent Sec. is time to start the prone slow be to The TCP can row). on AR third latency because the IP RLC in reduced time of lines in over impact sparser resulting and flight subsequently (shorter in the higher, throughputs when bytes is phase TCP latency slow-start rate the RLC of smaller during number especially to relatively row), the the window (second by in congestion shown growth TCP is of the This case. slowly. cause test up LTE can ramp public RTT RLC the with longer in increases The row) especially (first row), RTT (bottom TCP cases, latency the test Across that phones. observe load we LTE of number private and traffic our unknown on performed while are testbed traffic phones) as finite-buffer cases #load test Mbps (+ similar key the pair a for AR throughput is IP which and PDU(s), latency RLC the into latency. then RAN fit to layer to segments contributor for RLC or packets the period concatenates IP size, it scheduling the PDU operation: current the important the the an on determine performs in Based users device. PDU the given RLC all a the by finally of generated and conditions size load channel MAC, traffic The LTE’s RLC, transmission. the to the for and packets PDUs to as there its on layer from passes impact PHY LTE’s and layer understand layer, IP to PDCP The order performance. in layer, AR IP the below Analysis RAN Layer: large IP the the after Below bearer E. dedicated a occurred. keep IV-C, have Sec. to spikes in need data observed spikes not data may small in we and example, large For the network. of the in light users other to impacts negative hnteA p sslre Ppce ie o trans- for sizes packet IP larger uses app AR the When traffic the from gleaned insights the on based section, this In RLC per-packet between relationship the illustrates 11a Fig. behavior AR’s at look detailed a take we section, this In Rpi N + pair AR .A D AR V. § V epoieA einoptimizations. design AR provide we IV, ESIGN risaedn npbi T with LTE public in done are trails hr odpoe eeae12 generate phones load where O PTIMIZATIONS cwnd uiga Rburst, AR an during § IV-B. AR Pair (Private LTE Testbed) High Congestion Low Congestion Avg. Number of RLC PDUs AR Pair + 1 (Private LTE Testbed) Throughput Latency 10 AR Pair + 2 (Private LTE Testbed) 20 Goodput

RLC PDUs per IP Packet AR Pair + N (Public LTE) Latency 4 2 RLC PDUs 10 5

(s) 1 2 Mbps TCP RTT 0 5 0 0 0 4×10 400 650 1430 400 650 1430 400 650 1430

2×105 (s) RAN Latency Aggregate IP MTU Size (bytes) IP MTU Size (bytes) TCP Flights Bytes in Bytes 0 (b) RAN latency for different IP MTU sizes (c) High congestion 60

Aggregate Average Outgoing DataAverage Outgoing Data 40 RAN Latency per Burst (MB) IP 20 6 per Burst (MB) 15 (Mbps) 0 3 Throughput 4 30 2 10 20 2 1 5 (ms)

RLC 10 Latency 0 0 0 0 5 10 15 20 (%) Network Overhead Without Small With Small 400 650 1430 Time Slot (s) IP MTU Size (bytes) Aggrergate RAN Latency (s) RAN Latency Aggrergate BG Traffic BG Traffic (a) Per-packet RLC latency (d) Small background traffic (e) Network Overhead Fig. 11. Per-packet RLC latency is high at the beginning of the data spike, increasing RTT especially during TCP slow start. A smaller IP MTU reducing RLC segmentation and small amount of background traffic are possible solutions to help reduce AR latency. of the available RAN capacity. With sub-optimal, smaller IP and 150% than the 1430-byte and 400-byte packet sizes, packets, this overhead becomes significantly high, affecting respectively (Fig. 11b, 11c). The 400-byte packet sizes achieve application goodput (see Fig. 11e). We propose a technique lowered throughput despite similar RLC segmentation to 650- to optimize the packet sizes of the AR app by heuristically byte is because the former under-utilizes the network capacity, addressing this non-linear trade-off, based on underlying RAN observed by the maximum TCP cwnd for 400-byte reaching conditions. In particular, when there is significant RLC seg- only 493KB, while 650-byte and 1430-byte can reach 657KB mentation of the packets, we reduce the IP packet size closer and 690KB, respectively. However, in low network congestion to a moving average of the instantaneous RLC PDU sizes for environments (i.e., the mall), reducing the packet size has little the UE. We carefully adapt IP packet sizes so that the gain impact on aggregate RAN latency because the eNB already from a quicker increase in the TCP congestion window for an allocates a larger RLC PDU to the device, resulting in little AR burst offsets the loss from additional overhead of using RLC segmentation. Hence, our technique selects the default more IP packets for the same burst. large packet size close to 1430 byes. In conclusion, network- Packet sizes can be varied by configuring the Maximum aware AR app design by choosing smart packet sizes for Segment Size (MSS) of the AR flow or Maximum Transmis- the app can improve aggregate RAN latency, end-to-end AR sion Unit (MTU) of the AR UE. We conduct CloudAnchor latency, network throughput and application goodput. experiments by adapting the IP packet sizes of the AR stream- B. Network-Agnostic Optimization: Small Background Traffic ing session over public LTE networks under different network Another AR design optimization we propose is “priming” conditions (both congested and less-congested scenarios) using the eNB with information about the amount of data that the our technique and present the results in Fig. 11b. We evaluate AR application will transfer in its next burst. This technique the packet sizes selected by our technique (650 bytes) against is network-agnostic, without the need to adapt to variations in the default large packet size of 1430 bytes and a smaller sub- the RAN. Typically, when the hosting device starts sending optimal packet size of 400 bytes. either a new uplink AR burst or new data in the middle of In Fig. 11b, for a more congested public LTE network (i.e., an AR burst after a longer idle period (lasting for tens of campus) scenario, the default large packet size of 1430 bytes milliseconds), the UE has to request for resources from the undergoes significant segmentation (around 7 RLC PDUs per eNB, incurring protocol signaling latency. The eNB is initially packet) and hence, the aggregate RAN latency is high (∼11 s). unaware of the uplink sending buffer, and may only allocate The optimal packet size for this scenario, yielded by our a small uplink grant (max. 125 bytes) for the UE. Then, upon technique, is a smaller value, around 650 bytes. Upon setting data PDU transmission, the UE also piggybacks the uplink this, the aggregate RAN latency is reduced by 37% and 58%, buffer size, which the eNB subsequently uses to allocate larger when compared to 1430 bytes and 400 bytes, respectively. At resource grants. This causes RLC segmentation, increasing the the same time, the network throughput and the AR application per-packet RAN latency, especially in congested scenarios. goodput from a 650-byte packet size increases by over 62% In order to make the eNB aware of the device’s uplink buffer during an AR session, we generate small amounts of ACKNOWLEDGMENT background uplink traffic, using an icmp packet of 100 bytes We thank AT&T Labs Research for its internship program every 2 − 5ms. Since there is an active small data transfer where the research was conducted. We also thank the anony- even during inter- or intra-AR burst idle periods, the UE is mous reviewers for their valuable comments. This work has always scheduled minimal resources and it constantly piggy- been supported in part by NSF grant CNS-1942700. backs information about its uplink sending buffer to the eNB. This maximizes buffer-aware scheduling for the UE, which REFERENCES minimizes protocol signaling latency and RLC segmentation. [1] L. Liu et al., “Edge assisted real-time object detection for mobile In Fig. 11d, we plot the aggregate RAN latency and amount augmented reality,” ACM MobiCom, 2019. [2] T. Merel, “The reality of vr/ar growth.” https://techcrunch.com/2017/01/ of outgoing data, with and without the additional background 11/the-reality-of-vrar-growth/. traffic, for an AR session over public LTE network. The results [3] Google, “Share ar experiences with cloud anchors,” show that this small background traffic helps reduce aggregate https://developers.google.com/ar/develop/java/cloud-anchors/ cloud-anchors-overview-android. RAN latency by ∼50% on average, at the cost of a negligible [4] Z. Chen et al., “An empirical study of latency in an emerging class increase in outgoing data size (including the extra background of edge computing applications for wearable cognitive assistance,” in traffic). Our UE logs show that the average uplink resource ACM/IEEE Symposium on Edge Computing. ACM, 2017. [5] M. Suznjevic et al., “Analyzing the Effect of TCP and Server Population grant for each MAC PDU during the AR session increases on Massively Multiplayer Games,” International Journal of Computer from 593 bytes to 1191 bytes with small background traffic. Games Technology, vol. 2014, no. 602403, pp. 1 – 17, 2014. [6] A. Langley et al., “The quic transport protocol: Design and internet-scale deployment,” in ACM SIGCOMM, 2017. C. Discussion: Application-layer Optimizations [7] P. Jain et al., “Overlay: Practical mobile augmented reality,” ACM Finally, we briefly discuss other potential application-layer MobiSys, 2015. [8] X. Ran et al., “DeepDecision: A Mobile Deep Learning Framework for solutions to reduce AR latency. In our existing experimental Edge Video Analytics,” IEEE INFOCOM, 2018. setup, the network data transmissions were opaque due to the [9] P. Li et al., “Monocular visual-inertial state estimation for mobile internals of the Google ARCore platform being closed source. augmented reality,” in IEEE ISMAR, Oct 2017, pp. 11–21. [10] X. Ran et al., “ShareAR: Commnication-Efficient Mobile Augmented However, we hypothesize that the data transmissions consist Reality,” ACM HotNets, 2019. of device data that is used for localization, as localization is [11] W. Zhang et al., “Cars: Collaborative augmented reality for socializa- known to be an integral part of AR [18] Reducing the fidelity tion,” ACM HotMobile, 2018. [12] H. Qiu et al., “AVR: Augmented vehicular reality,” ACM MobiSys, 2018. of the device localization data, for example by quantizing the [13] Google, “Arcore overview,” https://developers.google.com/ar/discover/. data or sub-sampling the data in time, could reduce the amount [14] Apple, “Arkit - apple developer,” https://developer.apple.com/arkit/. of data requiring transmission and thus the network latency. On [15] Microsoft, “Shared experiences in unity,” https://docs.microsoft.com/ en-us/windows/mixed-reality/shared-experiences-in-unity, March 2018. the other hand, this may reduce device localization accuracy [16] “Magic leap,” https://www.magicleap.com/. and impact the placement of virtual objects in the user’s [17] K. Apicharttrisorn et al., “Custom mobileinsight analyzer and python display; thus, we intend to explore such effects in future work, scripts for ran-level deep analysis,” https://github.com/patrick-ucr/ran latency analyer mi. using open-source AR systems [9] that allow modification of [18] D. Schmalstieg and T. Hollerer, Augmented reality: principles and the application layer. practice. Addison-Wesley Professional, 2016. [19] Q. Liu and T. Han, “Dare: Dynamic adaptive mobile augmented reality with edge computing,” IEEE ICNP, 2018. VI.CONCLUSIONS [20] W. Zhang et al., “Jaguar: Low Latency Mobile Augmented Reality with Flexible Tracking,” ACM Multimedia, 2018. The goal of high quality AR has engendered tremendous [21] K. Chen et al., “MARVEL: Enabling mobile augmented reality with low amount of research, but there has thus far been little focus on energy and low latency,” ACM Sensys, 2018. [22] K. Apicharttrisorn et al., “Frugal following: Power thrifty object detec- the impact of the cellular network. In this paper, we show tion and tracking for mobile augmented reality,” ACM SenSys, 2019. through extensive measurements on both an industry LTE [23] D. Zou and P. Tan, “Coslam: Collaborative visual slam in dynamic testbed and public LTE that RAN latency is a significant part environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 354–366, 2012. of the end-to-end AR experience, accounting for nearly 31.2% [24] P. Schmuck and M. Chli, “Multi-uav collaborative monocular slam,” of the total latency. Unless this is reduced significantly, there IEEE International Conference on Robotics and Automation, 2017. is little hope for achieving AR with high QoE. However, our [25] M. Alasti et al., “Quality of service in wimax and lte networks,” IEEE Communications Magazine, vol. 48, no. 5, pp. 104–111, 2010. results also provide hope: AR traffic is very bursty in nature, [26] H.-C. Jang and C.-P. Hu, “Fairness-based adaptive qos scheduling for making it a suitable candidate for practical traffic management lte,” in IEEE International Conference on ICT Convergence, 2013. schemes like QCI (which improves latency by up to 33%). [27] M. Anas et al., “Combined admission control and scheduling for qos differentiation in lte uplink,” in IEEE Vehicular Technology Conference, Further, we also design network-aware and network-agnostic 2008. optimizations that improve latency by ∼40-70%. Future work [28] Google, “Just a Line - Draw Anywhere, with AR,” https://justaline. includes a longitudinal study of AR users to learn specific withgoogle.com/. [29] Google, “Arcore cloud anchor api,” https://console.cloud.google.com/ AR app behaviors, which can then drive the development of a marketplace/details/google/arcorecloudanchor.googleapis.com. smart QCI-based scheduler specifically tailored for AR traffic [30] K. Basques, “Get started with remote debugging android devices,” https: characteristics. We will also quantify how 5G technologies can //developers.google.com/web/tools/chrome-devtools/remote-debugging. [31] Y. Li et al., “Mobileinsight: Extracting and analyzing cellular network help close the gap of achieving seamless multi-user AR QoE information on smartphones,” in ACM MobiCom, 2016. by reducing the overall RAN latency.