<<

ZipNet-GAN: Inferring Fine-grained Mobile Tra€ic Pa‚erns via a Generative Adversarial Neural Network Chaoyun Xi Paul Patras School of Informatics School of Electronic Information and School of Informatics University of Edinburgh, UK Communications, University of Edinburgh, UK [email protected] Huazhong University of Science and [email protected] Technology, China [email protected] ABSTRACT 1 INTRODUCTION Large-scale mobile trac analytics is becoming essential to digital Œe dramatic uptake of mobile devices and online services over infrastructure provisioning, public transportation, events planning, recent years has triggered a surge in mobile data trac. Industry and other domains. Monitoring city-wide mobile trac is however forecasts monthly global trac consumption will surpass 49 ex- a complex and costly process that relies on dedicated probes. Some abytes (1018) by 2021, which is a seven-fold increase of the current of these probes have limited precision or coverage, others gather utilisation [1]. In this context, in-depth understanding of mobile tens of gigabytes of logs daily, which independently o‚er limited in- trac paŠerns, especially in large-scale urban cellular deployments, sights. Extracting €ne-grained paŠerns involves expensive spatial becomes critical. In particular, such knowledge is essential for dy- aggregation of measurements, storage, and post-processing. In this namic resource provisioning to meet end-user application require- paper, we propose a mobile trac super-resolution technique that ments, Internet services that rely on user location information, and overcomes these problems by inferring narrowly localised trac the optimisation of transportation systems [2–4]. consumption from coarse measurements. We draw inspiration from Mobile trac measurement collection and monitoring currently image processing and design a deep-learning architecture tailored rely on dedicated probes deployed at di‚erent locations within the to mobile networking, which combines Zipper Network (ZipNet) network infrastructure [5]. Unfortunately, this equipment (i) either and Generative Adversarial neural Network (GAN) models. Œis en- acquires only coarse information about users’ position, the start ables to uniquely capture spatio-temporal relations between trac and end of data sessions, and volume consumed (which is required volume snapshots routinely monitored over broad coverage areas for billing), while roughly approximating location throughout ses- (‘low-resolution’) and the corresponding consumption at 0.05 km2 sions – Packet Data Network Gateway (PGW) probes; (ii) or has level (‘high-resolution’) usually obtained a‰er intensive computa- small geographical coverage, e.g. Radio Network Controller (RNC) tion. Experiments we conduct with a real-world data set demon- or eNodeB probes that need to be densely deployed, require to strate that the proposed ZipNet(-GAN) infers trac consumption store tens of gigabytes of data per day at each RNC [6], but cannot with remarkable accuracy and up to 100× higher granularity as independently quantify data consumption. In addition, context compared to standard probing, while outperforming existing data information recorded is o‰en stale, which renders timely infer- interpolation techniques. To our knowledge, this is the €rst time ences dicult. Mobile trac prediction within narrowly localised super-resolution concepts are applied to large-scale mobile trac regions is vital for precision trac-engineering and can further analysis and our solution is the €rst to infer €ne-grained urban bene€t the provisioning of emergency services. Yet this remains trac paŠerns from coarse aggregates. costly, as it involves non-negligible overheads associated with the transfer of reports [5], substantial storage capabilities, and inten- CCS CONCEPTS sive o‚-line post-processing. In particular, combining di‚erent •Networks →Network measurement; information from a large number of such specialised equipment is required, e.g. user position obtained via triangulation [7], data KEYWORDS session timings, trac consumed per location, etc. Overall this is a challenging endeavour that cannot be surmounted in real-time by mobile trac; super-resolution; deep learning; directly employing measurements from independent RNC probes. generative adversarial networks In order to simplify the analysis process, mobile operators make simple assumptions about the distribution of data trac consump- tion across cells. For instance, it is frequently assumed users and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed trac are uniformly distributed, irrespective of the geographical for pro€t or commercial advantage and that copies bear this notice and the full citation layout of coverage areas [8]. Unfortunately, such approximations on the €rst page. Copyrights for components of this work owned by others than the are usually highly inaccurate, as trac volumes exhibit consid- author(s) must be honored. Abstracting with credit is permiŠed. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci€c permission erable disparities between proximate locations [3]. Operating on and/or a fee. Request permissions from [email protected]. such simpli€ed premises can lead to de€cient network resources CoNEXT ’17, Incheon, Republic of © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. 978-1-4503-5422-6/17/12...$15.00 DOI: 10.1145/3143361.3143393 CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea C. Zhang et al.

inspires us to employ image processing techniques to learn end- Low-resolution image High-resolution image to-end relations between low- and high-resolution mobile trac snapshots. We illustrate the similarity between the two SR problems in Fig. 1. Recent achievements in GPU based computing [12] have Image SR led to important results in image classi€cation [13], while di‚erent neural network architectures have been successfully employed to learn complex relationships between low- and high-resolution images [14, 15]. Œus we recognise the potential of exploiting deep learning models to achieve reliable mobile trac super-resolution Coarse-grained Measurements Fine-grained Measurements (MTSR) and make the following key contributions: (1) We propose a novel Generative Adversarial neural Network (GAN) architecture tailored to MTSR, to infer €ne-grained mo- bile trac paŠerns, from aggregate measurements collected MTSR by network probes. Speci€cally, in our design high-resolution trac maps are obtained through a generative model that out- puts approximations of the real trac distribution. Œis is trained with a discriminative model that estimates the prob- ability a sample snapshot comes from a €ne-grained ground truth measurements set, rather than being produced by the generator. (2) We construct the generator component of the GAN using Figure 1: Illustration of the image super-resolution (SR) original deep zipper network (ZipNet) architecture. Œis up- problem (above) and the underlying principle of the pro- grades a sophisticated ResNet model [16] with a set of addi- posed mobile trac super-resolution (MTSR) technique (be- tional ‘skip-connections’, without introducing extra parame- low). Figure best viewed in color. ters, while allowing gradients to backpropagate faster through the model in the training phrase. Œe ZipNet further introduces 3D upscaling blocks to jointly extract spatial and temporal fea- tures that exist in mobile trac paŠerns. Our design acceler- allocations and implicitly to modest end-user quality of experi- ates training convergence and we demonstrate it outperforms ence. Alternative coarse-grained measurement crowdsourcing [9] the baseline Super-Resolution Convolutional Neural Network remains impractical for trac inference purposes. (SRCNN) [14] in terms of prediction accuracy. In this paper we propose an original mobile trac ‘super-reso- (3) To stabilise the adversarial model training process, and prevent lution’ (MTSR) technique that drastically reduces the complexity model collapse or non-convergence problems (common in GAN of measurements collection and analysis, characteristic to purely design), we introduce an empirical loss function. Œe intuition based tools, while gaining accurate insights into €ne-grained behind our choice is that the generator can be optimised and mobile trac paˆerns at city scale. Our objective is to precisely infer complexity reduced, if using a loss function for model training narrowly localised trac consumption from aggregate coarse data that is insensitive to model structure and hyper-parameters recorded by a limited number of probes (thus reducing deployment con€guration. costs) that have arbitrary granularity. Achieving this goal is chal- (4) We propose a data processing and augmentation procedure to lenging, as small numbers of probes summarise trac consumption handle the insuciency of training data and ensure the neural over wide areas and thus only provide ‘low-resolution’ snapshots network model does not turn signi€cantly over-€Šed. Our that cannot capture distributions correlated with human mobility. approach crops the original city-wide mobile data ‘snapshots’ Further, measurement instrumentation is non-uniformly deployed, to smaller size windows and repeats this process with di‚erent as coverage area sizes depend on the population density [10]. Such o‚sets to generate extra data points from the original ones, diversity creates considerable ambiguity about the trac consump- thereby maximising the usage of data sets available for training. tion at sub-cell scale and multiple ‘high-resolution’ snapshots could (5) We conduct experiments with a publicly available real-world be matched to their aggregate counterparts. Inferring the precise mobile trac data set and demonstrate the proposed ZipNet trac distribution is therefore hard. (-GAN) precisely infer €ne-grained mobile trac distributions Drawing inspiration from image processing. Spatio-tempo- with up to 100× higher granularity as compared to standard ral correlations we observe between mobile trac paŠerns prompt probing, irrespective to the coverage and the position of the us to represent these as tensors that highly resemble images (cross- probes. Importantly, our solutions outperform existing tra- spatial relations) or videos (cross-temporal relations). It becomes ditional and deep-learning based interpolation methods, as apparent that a problem similar to the one we tackle exists in we achieve up to 78% lower reconstruction errors, 40% higher the image processing €eld, where images with small number of €delity of reconstructed trac paŠern, and improve the struc- pixels are enhanced to high-resolution. Œere, a super-resolution tural similarity by 36.4×. (SR) imaging approach mitigates the multiplicity of solutions by constraining the solution space through prior information [11]. Œis ZipNet-GAN: Inferring Fine-grained Mobile Tra€ic Pa‚erns … CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea

We believe the proposed ZipNet(-GAN) techniques can be de- True fine traffic ployed at a gateway level to e‚ectively reduce the complexity and measurement enhance the quality of mobile trac analysis, by simplifying the networking infrastructure, underpinning intelligent resource man- Coarse traffic agement, and overcoming service congestion in popular ‘hot spots’. measurement Real? 2 PROBLEM FORMULATION Artifact? Generator Discriminator Œe objective of MTSR is to infer city-wide €ne-grained mobile Inferred fine traffic measurement data trac distributions, using sets of coarse-grained measurements collected by probes deployed at di‚erent locations. We formally de€ne low-resolution trac measurements as a spatio-temporal Figure 2: GAN operating principle in MTSR problem. Only ML { L L L } L the generator is employed in prediction phase. sequence of data points = D1 , D2 ,..., DT , where Dt is a snapshot at time t of the mobile trac consumption summarised 3.1 Essential Background over the entire coverage area and in practice partitioned into V L 1 V v As with all neural networks, the key to their performance is the cells (possibly of di‚erent size), i.e. Dt = {lt ,...,lt }. Here lt represents the data trac consumption in cell v at time t. training process that serves to con€gure their parameters. When MH { H H H } training GANs, the generator G takes as input a noisy source (e.g. We denote = D1 , D2 ,..., DT the high-resolution mo- bile trac measurement counterparts (which are traditionally ob- Gaussian or uniformly distributed) z ∼ Pn(z) and produces an H output xˆ that aims to follow a target unknown data distribution tained via aggregation and post-processing), where Dt is a mo- bile trac consumption snapshot at time t over I sub-cells, i.e. (e.g. pixels in images, voice samples, etc.). On the other hand, D G DH = {h1,...,hI }. Here hi denotes the data trac volume in the discriminator randomly picks data points generated by , t t t t i.e. xˆ ∼ G(z), and others sampled from the target distribution sub-cell i at time t. Dissimilar to ML, we work with sub-cells of the x ∼ P (x), and is trained to maximise the probabilities that xˆ is same size and shape, therefore DH points have the same measur- r t fake and x is real. In contrast, G is trained to produce data whose ing granularity. Note that both ML and MH measure the trac distribution is as close as possible to P (x), while maximising the consumption in the same area and for the same duration. r probability that D makes mistakes. Œis is e‚ectively a two-player From a machine learning prospective, the MTSR problem is to in- game where each model is trained iteratively while € the other fer the most likely current €ne-grained mobile trac consumption, one. Œe joint objective is therefore to solve the following minimax given previous S observations of coarse-grained measurements. problem [17]: Denoting this sequence F S = {DL , t t−S+1 E [ D(x)] + E [ ( − D(G(z)))]. L min max x∼Pr (x) log z∼Pn (z) log 1 (2) ..., Dt }, MTSR solves the following optimisation problem: G D Once trained, the generator G is able to produce arti€cial data H  H S  Det := arg maxp Dt |Ft , samples from the target distribution given noisy input z. H (1) Dt In the case of our MTSR problem, the input of G is sampled from the distribution of coarse-grained measurements p(F S ), instead H H t where Det denotes the solution of the prediction. Both Dt and of traditional Gaussian or uniform distributions. Our objective is S H S Ft are high-dimensional, since they represent di‚erent trac mea- to understand the relations between Dt and Ft , i.e. modelling   surements across a city. To precisely learn the complex correlation p DH |F S . D is trained to discriminate whether the data is a real between DH and F S , in this work we propose to use a Generative t t t t €ne-grained trac measurement snapshot, or merely an artefact Adversarial Network (GAN), which will model the conditional dis- G  H S  produced by . We summarise this principle in Fig. 2. tribution p Dt |Ft . As we will demonstrate, the key advantage of employing a GAN structure is that it will not only minimise the 3.2 ‡e ZipNet-GAN Architecture mean square error between predictions and ground truth, but also Recall that our GAN is composed of a generator G that takes low- yield remarkable €delity of the high-resolution inferences made. S resolution measurements Ft as input and reconstructs their high- H resolution counterparts Dt , and a discriminator D whose task is 3 PERFORMING MTSR VIA ZIPNET-GAN relatively light (learning to discriminate samples and minimising In what follows we propose a deep-learning approach to tackle the the probability of making mistakes). MTSR problem using GANs. Œis is a novel unsupervised learning In general the generator has a complicated structure, which is framework for generation of arti€cial data from real distributions required for the reconstruction of data points with unknown distri- through an adversarial training process [17]. In general, a GAN is butions. In our MTSR problem, we want to capture the complex cor- S H composed of two neural network models, a generator G that learns relations between Ft and Dt . To this end, we propose Zipper Net- the data distribution, and a discriminative model D that estimates work (ZipNet), an original deep neural network architecture. Œis the probability that a data sample came from the real training data upgrades a ResNet model [16] with additional ‘skip-connections’, rather than from the output of G. We €rst give a brief overview of without introducing extra parameters, while allowing gradients to general GAN operation, then explain how we adapt this structure backpropagate faster through the model, and accelerate the training to the MTSR problem we aim to solve. process. Œe overall ZipNet architecture is illustrated in Fig. 3. To CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea C. Zhang et al.

Stacks of Cov.+BN+LReLU

Outputs Stacks of (high-resolution Inputs Cov.+BN+LReLU (low-resolution measurements) measurements) (3D-DeCov.+BN+LReLU+3D- Staggered skip- Cov.)*3 connections Global Skip-connection 3D upscaling blocks Zipper convolutional blocks Convolutional blocks

Figure 3: ‡e deep Zipper Network (ZipNet) generator architecture, consisting of 3D upscaling, zipper convolutional, and standard convolutional blocks. Conv. and deconv. denote convolutional and deconvolutional layers. exploit important historical trac information, we introduce 3D block, in order to provide sucient features for the mobile trac upscaling blocks that extract the spatio-temporal features speci€c €nal prediction. to mobile trac paŠerns. Œe proposed ZipNet comprises three Œe proposed ZipNet is a complex architecture that has over main components, namely: 50 layers. In general, accuracy grows with the depth (which is • 3D upscaling blocks consisting of a 3D deconvolutional layer precisely what we want to achieve), but may also degrade rapidly, [18], three 3D convolutional layers [19], a batch normalisation (BN) if the network becomes too deep. To overcome this problem, we layer [20], and a Leaky ReLU (LReLU) activation layer [21]. 3D use shortcut connections at di‚erent levels between the comprising (de)convolutions are employed to capture spatial (2D) and temporal blocks, as illustrated in Fig. 4 –this resembles a zipper, hence the relations between trac measurements. Œe deconvolution is a naming. Œe structure can be regarded as an extension to residual transposed convolutional layer widely employed for image upscal- networks (ResNets) previously used for image recognition [16], ing. 3D convolutional layers enhance the model representability. where the zipper connections signi€cantly reduce the convergence BN layers normalise the output of each layer and are e‚ective in rate and improve the model’s accuracy, without introducing extra training acceleration [20]. LReLUs improve the model non-linearity parameters and layers. and their output is of the form: Œe fundamental module B of the zipper convolutional block x, x > 0; comprises a convolutional layer, a BN layer, and an activation LReLu(x) = (3) αx, x < 0, function. Staggered skip connections link every two modules, and a global skip connection performs element-wise addition between where α is a small positive constant (e.g. 0.1). Œe objective of such S input and output (see Fig. 3) to ensure fast backpropagation of the upscaling blocks is to up-sample the input Ft into tensors that have gradients. Œe skip connections also build an ensemble of networks h the same spatial resolution as the desired output Dt . Œe number of with various depths, which has been proven to enhance the model’s upscaling blocks increases with the resolution of the input (from 1 performance [22]. Further, replacing layer-wise connections (as to 3). ‡ese 3D upscaling blocks are key to jointly extracting e.g. in [23]) with light-weight staggered shortcuts reduces the spatial and temporal features speci€c to mobile trac. computational cost. Compared to the original ResNets, extra zipper • Zipper convolutional blocks (core), which pass the output paths act as an additional set of residual paths, which reinforce of the 3D upscaling blocks, through 24 convolutional layers, BN the ensembling system and alleviate the performance degeneration layers, and LReLU activations, with staggered and global skip con- problem introduced by deep architectures. Œe principle behind nections. Œese block hierarchically extract di‚erent features and ensembling systems is collecting a group of models and voting on construct the input for the next convolutional component. We give their output for prediction (e.g. using a random forest tool). Extra further details about this block in the rest of this sub-section. zipper paths increase the number of candidate models in the voting • Convolutional blocks that summarise the features distilled process, which improves the robustness of the architecture and by the Zipper blocks and make the €nal prediction decisions. Œe contributes to superior performance. block consists of three standard convolutional layers, BN layers, Finally, the discriminator D accepts simultaneously the predic- and LReLU activations, without skip connections. Each layer is tions made by the generator and €ne-grained ground truth mea- con€gured with more feature maps as compared to the previous surements, and minimises the probability of misjudgement. In our design D follows a simpli€ed version of a VGG-net neural network architecture [24] routinely used for imaging applications, and con- sists of 6 convolutional blocks (convolutional layer + BN layer + LReLU), as illustrated in Fig. 5. Œe number of feature maps dou- bles every other layer, such that the €nal layer will have sucient features for accurate decision making. Œe €nal layer employs Figure 4: 5 Zipper convolutional blocks; each module B in- a sigmiod activation function, which constrains the output to a cludes a convolutional layer, a BN layer and a LReLU activa- probability range, i.e. (0, 1). tion. Circular nodes represent additions. ZipNet-GAN: Inferring Fine-grained Mobile Tra€ic Pa‚erns … CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea

covariance matrix σ 2I [26], i.e. Cov.+BN+ Cov.+BN+ 2 LReLU ϵ ∼ N (ϵ|0, σ I). (6) LReLU Cov.+BN+ Real data LReLU Œen a €ne-grained data point DH can be approximated with re- Sigmoid t output spect to the corresponding prediction error as following a multi- Cov.+BN+ variate Gaussian distribution with a mean that depends on F S and Cov.+BN+ t LReLU 2 Outputs from LReLU Cov.+BN a diagonal covariance matrix σ I. Œis allows us to rewrite the the generator +LReLU conditional distribution in (4) as  H S  H S 2 p Dt |Ft ∼ N (Dt |G(Ft ), σ I). (7) Figure 5: Architecture of the discriminator D we use in ZipNet-GAN, based on the VGG-net structure [24]. Substituting (7) in (4), and adopting maximum likelihood estima- tion, the problem (4) is equivalent to minimising the following loss function, in order to con€gure the parameters ΘG of the generator: Œe remaining task in the design of the proposed neural network T 1 Õ h H S 2 2 S i is to con€gure the loss functions employed by the generator and L(ΘG) = ||D − G(F )|| − 2σ log D(G(F )) . (8) T t t t discriminator, which is key to the adjustment of the weights within t=S the di‚erent layers. Recall that σ 2 is the variance of the Gaussian distribution of ϵ, which can be considered as a trade-o‚ weight between the mean square 3.3 Designing the Loss Functions H S 2 S error (MSE) ||Dt − G(Ft )|| and the adversarial loss log D(G(Ft )). Œe MTSR is a supervised learning problem, hence directly deploy- Œe same loss function is used in [15] for image SR purposes, while ing the traditional adversarial training process is inappropriate. the weight σ 2 is manually set. Œis is because the data generated by G in traditional GANs is However, we €nd that the training process is highly sensi- stochastically sampled from the approximated target distribution, tive to the con€guration of σ 2. Speci€cally, the loss function whereas in MTSR we expect the output of the generator to follow as does not converge when σ 2 is large, while the discriminator rapidly close as possible the distribution of real €ne-grained measurements. reach an optimum if σ 2 is small, which in turn may lead to model S Œat is, individual data points G(Ft ) produces should be close to collapse [27] and overall poor performance. To solve this problem, their corresponding ground truth. Œis requires to minimise (i) the we propose an alternative loss function in which we replace the σ 2 H divergence between the distributions of the real data p(Dt ) and the term with the MSE. Œis yields S generated data p(G(Ft )), and (ii) the Euclidean distance between T 1 Õ S H S 2 individual predictions and their corresponding ground truth. L(ˆ ΘG) = (1 − 2 log D(G(F ))) · ||D − G(F )|| . (9) T t t t L2 loss functions commonly used in neural networks to solve the t=S optimisation problem given in (1) only ful€l the second objective In the above, the MSE term forces the predicted individual data above, but do not guarantee that the generator’s output will follow points to be close to their corresponding ground truth, while the the real data distribution. Œis may lead to failure to match the adversarial loss works to minimise the divergence between two €delity expected of high-resolution predictions [25]. Œerefore we distributions. Our experiments suggest that this new loss function add another term to (1), which requires to design di‚erent loss signi€cantly stabilises the training process, as the model never functions, while changing the original problem to collapses and the process converges fast. In what follows, we detail H  H S  S the training procedure. Det = arg maxp Dt |Ft ·D(G(Ft )), (4) DH t 3.4 Training the ZipNet-GAN where recall that D(G(F S )) is the probability that D predicts t To train the ZipNet-GAN we propose for solving the MTSR problem, whether data is sampled from the real distribution. Minimising we employ Algorithm 1, which takes a Stochastic Gradient Descent this new term D(G(F S )) aims to ‘fool’ the discriminator, so as to t (SGD) approach and we explain the steps involved next. Recall minimise the divergence between real and generated data distri- that the purpose of training is to con€gure the parameters of the butions, i.e. objective (i) above, and remedy the aforementioned neural network components, ΘG and ΘD. We work with the Adam €delity problem. optimiser [28], which yields faster convergence as compared to Œe discriminator D works as a binary classi€er, which is trained traditional SGD. to con€gure the discriminator’s parameters ΘD by maximising the We train G and D iteratively, each time for nG and nD sub- following loss function: epochs, by €xing the parameters of one and con€guring the others’, T and vice-versa, until their loss functions converge (line 3). At every ˆ Õ h S L(ΘD) = log D(Dt ) + log(1 − D(G(Ft ))). (5) sub-epoch (lines 4, 9), G and D randomly sample m low-/high- t=S resolution trac measurement pairs (lines 5, 10) and compute the Œe more challenging part is the design of the loss function gradients gG and gD (lines 6, 11) to be used in the optimisation employed to train the generator. Following the common hypoth- (lines 7, 12). esis used in regression problems, we assume that the prediction Œe key to this training process is that G and D make progress error ϵ follows a zero-mean Gaussian distribution, with diagonal synchronously. At early training stages, when G is poor, D can CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea C. Zhang et al.

Algorithm 1 Œe GAN training algorithm for MTSR. 1: Inputs: Batch size m, low-/high-res trac measure- ments ML and MH , learning rate λ, genera- tor and discriminator sub-epochs, nG and nD. 2: Initialise: Generative and discriminative models, G and D, parameterised by ΘG and ΘD. Pre-trained G by minimising (10). 3: while ΘG and ΘD not converge do 4: for eD = 1 to nD do 5: Sample low-/high-res trac measurement pairs { S H }m S ∈ ML H ∈ MH Ft , Dt t=1, Ft , Dt . Figure 6: Spatial distribution of mobile data trac consump- ← [ 1 Ím D( H ) tion in Milan during o‚-peak (le ) and peak (right) times. 6: gD ∆ΘD m i=1 log Dt + 1 Ím S Trac consumption per 10-minute interval varies between +m i=1 log(1 − D(G(Ft )))]. 7: ΘD ← ΘD + λ · Adam(ΘD,gD). 20 MB (green) to 5,496 MB (red). Figure best viewed in colour. 8: end for 9: = for eG 1 to nG do Windows 10: Sample low-/high-res trac measurement pairs { S H }m S ∈ ML H ∈ MH ... Ft , Dt t=1, Ft , Dt . 1 m S ← Í ( − D(G( )))· ... 11: gG ∆ΘG m t=1 1 2 log Ft H S 2 ·||Dt − G(Ft )|| . 12: ΘG ← ΘG − λ · Adam(ΘG,gG). Croping 13: end for 14: end while reject samples with high con€dence as they are more likely to Augmented sub-frames di‚er from the real data distribution. An ideal D can always €nd a Traffic measurement across Milano decision boundary that perfectly separates true and generated data points, as long as the overlapping measure support set of these two Figure 7: Illustration of the data augmentation technique distributions is null. Œis is highly likely in the beginning and can we propose to enable precise training in view of performing compromise the training of G. To overcome this issue, we initialise MTSR. Figure best viewed in colour. the generator by minimising the following MSE until convergence T 1 Õ H S 2 MSE(ΘG) = ||D − G(F )|| . (10) T t t with base stations, namely each time a user started/ended an Inter- = t S net connection, or a user consumed more than 5 MB. Œe coverage Note that at this stage the initialised G (line 2) could be directly area is partitioned into 100 × 100 squares of 0.055km2 (i.e. 235m × deployed for the MTSR purpose, since the MSE based initialisation 235m). We show snapshots of the trac’s spatial distribution for is equivalent to solving (1). However, this only minimises the both o‚-peak and peak times in Fig. 6. point-wise Euclidean distance, and the further training steps in Overall the data set includesT = 8, 928 snapshots of €ne-grained Algorithm 1 are still required to ensure the predictor does not lose mobile trac measurements. Œis number appears sucient when accuracy. In our experiments, we set nG and nD to 1, and learning using traditional machine learning tools, though it is grossly in- rate λ to 0.0001. sucient for the purpose of training a complex neural network, Next we discuss the processing and augmentation of the data set as in our case. Œis is due to the highly dimensional parameter we employ to train and evaluate the performance of the proposed set, which can turn the model over-€Šing if improperly con€gured ZipNet(-GAN). through training with small data sets. To overcome this problem, we augment the Milan data set by cropping each snapshot to mul- 4 DATA PROCESSING & AUGMENTATION tiple ‘windows’ of smaller size, each with di‚erent o‚sets (1 cell To train and evaluate the ZipNet(-GAN) architecture, we conduct increment in every dimension). We illustrate this augmentation experiments with a publicly available real-world mobile trac data process in Fig. 7. set released through Telecom Italia’s Big Data Challenge [29]. Œis We work with ‘sub-frames’ of 80×80 sub-cells, thereby producing contains network measurements in terms of total cellular trac 441 new data points from every snapshot. Note that the initial size volume observed over 10-minute intervals in Milan, between 1 Nov of model’s prediction matrix is also 80×80 and we employ a moving 2013 and 1 Jan 2014 (2 months). Œis was obtained by combining average €lter to construct the €nal predictions across the original call detail records (CDR) that were generated upon user interactions grid (i.e. 100 × 100). ZipNet-GAN: Inferring Fine-grained Mobile Tra€ic Pa‚erns … CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea

5 EXPERIMENTS Milano coverage map In this section we €rst describe brieƒy the implementation of the proposed ZipNet(-GAN), then evaluate its performance in terms of 2D granularity map of prediction accuracy (measured as Normalised Mean Root Square probes coverage Error – NMRSE), €delity of the inferences made (measured through Peak Signal-to-Noise Ratio – PSNR), and similarity between predic- tions made and ground truth measurements (structural similarity index – SSIM). We compare the performance of ZipNet(-GAN) with that of Uniform and Bicubic interpolation techniques [30], Sparse Coding method (SC) [31], Adjusted Anchored Neighbouring Re- gression (A+) [32], and Super-Resolution Convolutional Neural Network (SRCNN) previously used in imaging applications [14]. 10  10 sub-cells coverage 4  4 sub-cells coverage 5.1 Implementation 2  2 sub-cells coverage We implement the proposed ZipNet(-GAN), the SRCNN, the Uni- form and Bicubic interpolation methods using the open-source Python libraries TensorFlow [12] and TensorLayer [33].1 We train Figure 8: Cellular deployment map for Milan served by a the models using a GPU cluster comprising 19 nodes, each equipped mixture of probes with di‚erent coverage (le ), and the cor- with 1-2 NVIDIA TITAN X and Tesla K40M computing accelerators responding measurement granularity projected onto a 2D (3584 and respectively 2280 cores). To evaluate their prediction map (right). Best viewed in colour. performance, we use one machine with one NVIDIA GeForce GTX 970M GPU for acceleration (1280 cores). SC and A+ super-resolution €ner granularity. Fewer probes are assumed in surrounding areas, techniques are implemented using Matlab. In what follows, we de- covering larger regions (the yellow and green areas). We compute tail four MTSR instances with di‚erent granularity, which we use aggregate measurements collected by such dissimilar probes, each for comparison of the di‚erent SR methods. aggregate corresponding to the sum over the covered cells. We then project these points onto a square that becomes the input to our 5.2 Di‚erent MTSR Granularity model, as shown on the right in Fig. 8. Œerefore, for each trac In practice measurement probes are deployed at di‚erent locations snapshot we can obtain 400 measurement points that can be viewed (e.g. depending on user density) and o‰en have di‚erent coverage as a 2D square (right side of Fig. 8), which we use for training and in terms of number of cells. We approximate the coverage of a performing inferences. Note that although the average nf of the probe by a square area consisting of rf = nf × nf sub-cells, and mixture instance is 4, its structure di‚ers signi€cantly from that of refer to nf as an upscaling factor. Œe smaller the nf , the higher the the ‘up-4’ instance. Our evaluation will reveal that such di‚erences granularity of measurement. Given the heterogeneous nature of will a‚ect the accuracy of the MTSR outcome. cellular deployments, we construct four di‚erent MTSR instances We train all deep learning models with data collected between 1 with di‚erent granularity, as summarised in Table 1. Nov 2013 and 10 Dec 2013 (40 days), validate their con€guration with measurements observed over the following 10 days, and €nally Instance Con€guration nf rf evaluate their performance using the averages aggregated by the (Avg.) (Avg.) di‚erent probes (di‚erent granularity) between 20–30 Dec 2013. Up-2 Probes cover 2 × 2 sub-cells 2 4 Prior to training, all data is normalised by subtraction of the mean Up-4 Probes cover 4 × 4 sub-cells 4 16 and division by the standard deviation. Œe coarse-grained input L Up-10 Probes cover 10 × 10 sub-cells 10 100 measurement data (i.e. M ) is generated by averaging the trac Mixture 7% of probes cover 10 × 10 sub- 4 16 consumption over cells’ coverage. cells, 44% cover 4 × 4, and 49% cover 2 × 2 sub-cells. 5.3 Performance Evaluation Table 1: Con€guration of MTSR instances with di‚erent Evaluation Metrics: We quantify the accuracy of the proposed granularity considered. ZipNet(-GAN), comparing against that of existing super-resolution methods, in terms of Normalised Root Mean Square Error (NRMSE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Œe €rst three instances assume that all probes are uniformly (SSIM). distributed in Milan and have the same coverage, i.e. 4, 16, and NRMSE is computed over I sub-cells at time t as follows: 100 sub-cells respectively. Œe fourth instance corresponds to a uv mixture of probes, each with di‚erent coverage, as we illustrate in t I i i 2 1 Õ (het − ht ) × NRMSE = , (11) Fig. 8. Speci€cally, we consider three types of probes covering 2 2, H I 4×4, and respectively 10×10 sub-cells. In this instance, more probes Dt i=1 i serve the city centre (red area), each with smaller coverage and thus where het is the inferred high-resolution trac consumption in a i H 1Œe source-code of our implementation is available at hŠps://github.com/vyokky/ sub-cell i, ht is the corresponding ground truth value, and Dt ZipNet-GAN-trac-inference. is the ground truth mean. Note that NRMSE is frequently used CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea C. Zhang et al. in bioinformatics, experimental psychology, and other €elds, for comparing data sets or models with di‚erent scales [34]. Œis is e‚ectively the mean square deviation (the di‚erence between pre- dictions and ground truth) normalised by the mean value of ground truth samples. Low NRMSE values indicate less residual variance when the data sets subject to comparison may have di‚erent scales. To understand the accuracy of the di‚erent MTSR techniques from di‚erent perspectives, we also compute the PSNR and SSIM, which are typically used to evaluate the similarity between images. PSNR is a common measure of quality of image reconstruction (following compression), and has the following expression: I 1 Õ i i 2 PSNR = 20 log(maxhi ) − 10 log (h − he ) , (12) i I t t i=1 where (maxi hi ) is the highest trac volume observed in one cell (in our case 5,496MB). SSIM is commonly used as an estimate of the perceived quality of images and video [35], and is computed with  H H   H H  2Dt · Det + c1 2 cov(Dt , Det ) + c2 SSIM =     , (13) H 2 H 2 H H (Dt ) ·(Det ) + c1 var(Dt )var(Det ) + c2

H where Det is the average of the predictions, cov is the covariance between predictions and ground-truth, var denotes the variance of each data set, while c1 and c2 are parameters that stabilise the division with weak denominators. Unlike NRMSE, higher PSNR Figure 9: Comparison of inference accuracy of the pro- and SSIM values usually reƒect greater similarity between ground posed ZipNet(-GAN) and existing SR techniques in terms of truth and predictions. NRMSE (above), PSNR (middle), SSIM (below). Four upscal- Techniques for comparison: We summarise the performance of ing instances (di‚erent colour bars). Experimental results the proposed ZipNet(-GAN) against a range of existing interpo- with the Milan data set [29]. lation or image super resolution techniques, including Uniform spatial structure and scale, as compared to imaging data. Œerefore interpolation, Bicubic interpolation, Sparse Coding based methods traditional SR approaches are unable to capture accurately the re- (SC) [31], Adjusted Anchored Neighbouring Regression (A+) [32], lations between low- and high- resolution trac ‘frames’. On the and Super-Resolution Convolutional Neural Network (SRCNN) [14]. other hand, SRCNN works acceptably with low upscaling MTSR Uniform interpolation is routinely used by operators and assumes instances (i.e. ‘up-4’ and ‘up-2’), but performs poorly when pro- that the data trac volume is uniformly distributed across cells cessing the ‘up-10’ instances. Unlike Uniform, A+, and Bicubic, the [8]. Œe Bicubic interpolation is a popular non-parametric tool proposed ZipNet-GAN achieves an up to 65%, 76% and respectively frequently used to enhance the resolution of images [30]. SC and 78% smaller NRMSE. Œe ZipNet-GAN further aŠains the highest A+ are commonly used as benchmarks in image super-resolution PSNR and SSIM among all approaches, namely up to 40% higher evaluation. Finally, SRCNN is a benchmark deep learning architec- PSNR and a remarkable 36.4× higher SSIM. Although perhaps more tures that comprises three convolutional layers. We note that in subtle to observe, the ZipNet-GAN is more accurate than the Zip- our evaluation ZipNet is a simpli€ed version of the ZipNet-GAN, Net (i.e. without employing a discriminator), which con€rms the which is purely trained with the mean square error (10), without e‚ectiveness of the GAN approach. the help of the discriminator. Taking a closer look at Fig. 9, observe that the prediction accuracy Assessing the quality of inferences: We summarise the per- of all approaches drops as the upscaling factor nf grows. Œis formance of the proposed ZipNet(-GAN) and that of existing SR is indeed reasonable, since a larger nf corresponds to a greater techniques in terms of the aforementioned metrics in Fig. 9, where degree of aggregation and thus detail information loss, which will we use di‚erent bars for each of the four MTSR instances consid- pose more uncertainty to the models. Further, although the ‘up- ered for inference (2, 4, and 10 upscaling, and upscaling mixture). 4’ and mixture instances have the same average upscaling factor, Œe bars correspond to averages for inferences made over 10 days the proposed ZipNet(-GAN) operate somewhat beŠer with the (i.e. 1440 snapshots). former. Our intuition is that the mixture instance distorts the Observe that ZipNet-GAN achieves the best performance spatial correlation of the original measurements, as shown in Fig. 8. for all MTSR instances, outperforming traditional super reso- However, given the subtle performance di‚erences, MTSR with lution schemes. In particular, although SC and A+ work well in probes of dissimilar coverage and granularity remains feasible with image SR, their performance is inferior to that of simple Uniform the proposed ZipNet-GAN. and Bicubic interpolation techniques when performing MTSR. Our We now delve deeper into the performance of all methods con- intuition is that the mobile trac data has substantially di‚erent sider for MSTR and examine the behaviour of each with individual ZipNet-GAN: Inferring Fine-grained Mobile Tra€ic Pa‚erns … CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea

Fine-grained meas. (Ground truth) Coarse-grained meas. (Input) Uniform

3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.49 45.49 45.49 9.07 45.46 9.07 45.46 9.07 45.46 9.12 9.12 9.12 9.16 45.43 9.16 45.43 9.16 45.43 Longitude 9.21 Latitude Longitude 9.21 Latitude Longitude 9.21 Latitude Bicubic SC A+

3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.49 45.49 45.49 9.07 9.07 9.07 9.12 45.46 9.12 45.46 9.12 45.46 9.16 45.43 9.16 45.43 9.16 45.43 Longitude 9.21 Latitude Longitude 9.21 Latitude Longitude 9.21 Latitude SRCNN ZipNet ZipNet-GAN

3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.49 45.49 45.49 9.07 9.07 9.07 9.12 45.46 9.12 45.46 9.12 45.46 9.16 45.43 9.16 45.43 9.16 45.43 Longitude 9.21 Latitude Longitude 9.21 Latitude Longitude 9.21 Latitude

Figure 10: Snapshots of ground truth, input data, and the ‘up-10’ MTSR instance predicted by the proposed ZipNet (-GAN), existing interpolation methods and image SR techniques, using data collected in Milan on 21st Dec 2013. snapshots. To this end, Figs. 10 and 11 provide additional perspec- 5.4 ‡e Bene€ts of GAN tives on the value of employing the proposed ZipNet(-GAN) to In image SR, GAN architectures improve the €delity of high-res- infer mobile trac consumption with €ne granularity. Each €gure olution output, making the image more photo-realistic. Here we shows snapshots of the predictions made by all approaches, when show the GAN plays a similar role in MTSR. To this end, in Fig. 12 working with the ‘up-10’ (Fig. 10) and the mixture MTSR (Fig. 11) we present zoom snapshots of the predictions made by ZipNet and instances. In particular, in Fig. 10 we observe that ZipNet(-GAN) ZipNet-GAN. Œese o‚er a clear visual perception of the inference deliver remarkably accurate predictions with a 99% reduction in improvements of GAN in central parts of the city. Indeed, observe term of measurement points required, as the texture and details are that including a GAN in the architecture improves the prediction almost perfectly recovered. In contrast, the Uniform, Bicubic, SC, €delity, as it minimises the divergence between real and predicted A+ and SRCNN techniques, although improve the resolution of data data distributions, although this does not necessarily enhance over- trac ‘snapshots’, lose signi€cant details and deviate considerable all accuracy. Note that the additional accuracy does not come at the from the ground truth measurements (upper le‰ corner). cost of increased complexity, since the adversarial training is fast Turning aŠention to Fig. 11, observe that the input exhibits some in terms of convergence and the discriminator will be abandoned spatial distortion (top centre plot), as the probes aggregating mea- in the inference phase. surements have di‚erent coverage and locations. Despite this, the ZipNet(-GAN) still capture well spatial correlations and continues 5.5 Robustness to Abnormal Trac to perform very accurately (two plots in the boŠom right corner). To evaluate the robustness of our solution in the presence of trac In contrast, the Uniform and Bicubic interpolations, although cap- anomalies, we arti€cially add such trac paŠerns to the test data ture some spatial distribution mobile trac pro€les, signi€cantly set and investigate the behaviour of our proposal. Speci€cally, we under-estimate the trac volume in the city centre. In addition, introduce abrupt trac demands in suburban areas, which can the predictions made by SC and A+ exhibit signi€cant distortions be regarded as occurrences of social events (e.g. concert, football and yield inferior performance, demonstrating their image SR ca- match, etc.), as seen in the boŠom le‰ corner of the second sub-plot pabilities cannot be mapped directly to MTSR tasks. Lastly, the in Fig. 13. Although such anomalies do not occur in the training set, SRCNN approach, which employs deep learning, works acceptably the proposed ZipNet-GAN still successfully identi€es the locations in areas with low trac intensity, but largely underestimates the of abnormal trac, given averaged and smoothed inputs (€rst sub- trac volume in the city centre. plot). Œis implies that, to some extent, our proposal can perform as an anomaly detector operating only with coarse measurements. CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea C. Zhang et al.

Fine-grained meas. (Ground truth) Coarse-grained meas. (Input) Uniform

3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.49 45.49 45.49 9.07 9.07 45.46 9.07 9.12 45.46 9.12 9.12 45.46 9.16 45.43 9.16 45.43 9.16 45.43 Longitude 9.21 Latitude Longitude 9.21 Latitude Longitude 9.21 Latitude Bicubic SC A+

3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.49 45.49 45.49 9.07 9.07 9.07 9.12 45.46 9.12 45.46 9.12 45.46 9.16 45.43 9.16 45.43 9.16 45.43 Longitude 9.21 Latitude Longitude 9.21 Latitude Longitude 9.21 Latitude SRCNN ZipNet ZipNet-GAN

3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.53 Data traffic [MB] 45.49 45.49 45.49 9.07 9.07 9.07 9.12 45.46 9.12 45.46 9.12 45.46 9.16 45.43 9.16 45.43 9.16 45.43 Longitude 9.21 Latitude Longitude 9.21 Latitude Longitude 9.21 Latitude

Figure 11: Snapshots of ground truth, input data, and the mixture MTSR instance predicted by the proposed ZipNet(-GAN), existing interpolation and image SR techniques, using data collected in Milan on 21st Dec 2013.

Fine-grained meas. (Ground truth) ZipNet-GAN ZipNet

3500 3500 3500 3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0 45.6 Data traffic [MB] 45.6 Data traffic [MB] 45.6 Data traffic [MB] 45.58 45.58 45.58 9.16 45.57 9.16 45.57 9.16 45.57 9.18 9.18 9.18 Longitude9.19 45.55 Longitude9.19 45.55 Longitude9.19 45.55 9.21 Latitude 9.21 Latitude 9.21 Latitude

Figure 12: Zoom snapshots of ground truth and €ne-grained trac consumption predicted by the proposed ZipNet(-GAN) in an up-10 instance in central Milan.

5.6 Impact of Cross-Temporal Relations and illustrate in Fig. 14 the NRMSE aŠained with the three homo- We conclude our experiments by examining the impact cross-temporal geneous MTSR instances considered. Observe that the prediction correlations between trac measurements provided as input have error drops with the increase of the number of snapshot we provide on the ZipNet-GAN architecture we propose. To this end, we to our model in all instances, which indicates that earlier observa- (i) compare the MTSR accuracy with input of di‚erent temporal tions provide valuable insights toward inferring real €ne-grained lengths S, and (ii) compute the absolute value of €rst-order deriva- data trac consumption. Additionally, the historical measurements tives of the loss function employed over input. Œe magnitude of play a more signi€cant role with the increase of nf – in the ‘up-10’ these gradients are a good approximation of the sensitiveness of instance the error between predictions made with S = 1 and S = 6 €nal prediction decisions to changes of the input [36]. sequence lengths is much larger than in the same case for the other NRMSE with di‚erent length inputs: We feed the proposed instances. Œis brings important implications to operators assessing ZipNet-GAN with input of temporal length S ∈ {1, 3, 6} snapshots, the trade-o‚s between the length of inputs (which a‚ects model complexity) and prediction accuracy. ZipNet-GAN: Inferring Fine-grained Mobile Tra€ic Pa‚erns … CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea

Coarse-grained meas. (Input) Fine-grained meas. (Ground truth) ZipNet-GAN

3000 3000 3000 2500 2500 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500

0 Data traffic [MB] 0 Data traffic [MB] 0 Data traffic [MB] 45.53 45.53 45.53 45.49 45.49 45.49 9.07 9.07 9.07 9.12 45.46 9.12 45.46 9.12 45.46 Longitude9.16 45.43 Longitude9.16 45.43 Longitude9.16 45.43 9.21 Latitude 9.21 Latitude 9.21 Latitude

Figure 13: Behaviour in the presence of anomalies: snapshots of coarse-grained measurements fed as input (le ), ground truth (middle) and €ne-grained trac patterns predicted by the proposed ZipNet-GAN (right) in a mixture instance.

0.8 Up-2 0.8 Up-4 0.8 Up-10 Up-2 Up-4 Up-10 6 25 5 0.6 0.6 0.6 5 20 4 4 0.4 0.4 0.4 15 3 3 NRMSE NRMSE NRMSE 10 2 0.2 0.2 0.2 2 1 5 1 0.0 0.0 0.0 1 3 6 1 3 6 1 3 6 Gradient values 0 Gradient values 0 Gradient values 0 Temporal length S Temporal length S Temporal length S ×10 6 1 2 3 4 5 6 ×10 6 1 2 3 4 5 6 ×10 4 1 2 3 4 5 6 Frame Frame Frame Figure 14: NRMSE comparison for three MTSR instances, Figure 15: Mean magnitude of the gradient of the loss func- with di‚erent temporal length S of the input. S S tion L(Ft ) over inputs Ft , with di‚erent MTSR instances. Av- erages computed over the entire network across all test data. Magnitudes of gradients: Œe impact of the input’s temporal dimension can also be evaluated from the gradients’ perspective. trac paŠerns, and substantially outperforms traditional interpola- Œe loss function L is essentially a complex non-linear of the input tion methods and image SR techniques, as it aŠains up to 78% lower F S . However, this can be approximated by the €rst-order term of t prediction errors (NRMSE) and 36.4× higher structural similarity its Taylor expansion, i.e. (SSIM). By employing a high performance GPU, training ZipNet- S S T S GAN takes 2-3 days, which we believe is negligible, considering the L(Ft ) ≈ w(Ft ) · Ft + b, (14) long-term costs of post-processing alternatives. Given the reason- S S able training time, ZipNet-GAN can be easily ported to di‚erent where w(Ft ) is the gradient of the loss function L(Ft ) over input S urban layouts a‰er retraining for the target regions. Ft and b is a bias. Computing the absolute value of the gradient S T w(Ft ) , i.e. 6 DISCUSSION Obtaining city-scale €ne-grained mobile trac maps from coarse ∂L(F S ) ∂ h i t = ||DH − G(F S )||2( − D(G(F S ))) , aggregates collected by probes brings important bene€ts to a range S S t t 1 2 log t ∂Ft ∂Ft of applications critical to the urban realm. Namely, should give insights into the number of temporal steps required • Agile network resource management: our approach en- for accurate prediction with di‚erent MTSR instances. Œerefore ables infrastructure simpli€cation and reduces OpEx. Œis is in Fig. 15 we plot the average magnitude of the gradient of the because base station controllers (BSC) are shared by increasing S loss function over all inputs Ft , for all three homogeneous MTSR numbers of BSs, and already handle radio quality measure- instances. Observe that the most recent ‘frame’ (i.e. frame 6) yields ments and handovers. Adding further measurement capabili- the largest gradient for all instances, as we expect. Œis means that ties, co-ordination of triangulation, and real-time processing of the current measurement ‘snapshot’ provides the most valuable in- multi-modal measurements, poses scalability concerns, which formation for the model to reconstruct the €ne-grained counterpart. may eventually impact on end user QoE. Œe proposed ZipNet- Further, the contribution of historical measurements (i.e. frames GAN shi‰s the burden to a single dedicated cluster that only 1 to 5) increases with the upscaling factor (from 2 to 10), which requires to be trained infrequently and can perform coarse- to suggests that historical information becomes more signi€cant when €ne- grained measurement transformations, fed by summary less spatial information is available, which is consistent with the statistics already available at selected probes. insight gained earlier. • Events localisation & response: popular social events (e.g. We conclude that, by exploiting the potential of deep neural net- concerts, fairs, sports, etc.) or emergency situations (€res, works and generative adversarial networks, the proposed ZipNet- riots, or terrorist acts) exhibit high spatial similarity in terms GAN scheme can infer with high accuracy €ne-grained mobile of mobile data trac consumption, which can serve as an CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea C. Zhang et al.

indicator of density/size of the crowds involved. Accurate and A crowdsourced based approach to urban sensing is introduced timely localisation of such instances through inference of data in [9], where city-scale sensor data is regarded as the output of trac can help overcome network congestion in hot spots, but a lossy ‘urban camera’, and temporal-correlation and collective also appropriately provision policing and emergency services. decomposition are employed to infer air quality. et al. de€ne • Context-based business: accurate MTSR can also aid small a metric to measure the quality of urban sensing, subsequently businesses that can o‚er promotional products depending on studying the relationship between sensing resolution and the num- spontaneous circumstances, e.g. families gathering in the park ber of smartphones/vehicles participating in such applications [39]. on a sunny day. Likewise university PR services can dissemi- Compressive sensing and crowdsourcing measurement problems nate relevant information during highly-aŠended open/gradua- are di‚erent to the MTSR task we tackle in this work, since in our tion days. case measurements are not subject to loss, noise, or incompleteness, but instead are complete aggregates that lack granularity. Œerefore, we argue ZipNet-GAN is not limited to network infras- tructure related tasks, e.g. precision trac engineering or anoma- Image Super-Resolution: Œe super-resolution is becoming in- lous trac detection [37], but could also serve civil applications creasingly popular in the image processing €eld to reconstruct such as events planning or even transportation. Importantly, once high-resolution images from compressed versions. SR techniques trained the proposed technique can continuously perform infer- include early non-parametric methods such as Nearest, Bicubic ences on live streams, unlike post-processing approaches that only and Lanczos interpolation [40], which unfortunately return output work o‚-line and thus have limited utility. with overly smooth texture and distorted edges. Sparse coding [31] and neighbourhood regression methods [32, 41] overcome this 7 RELATED WORK issues as they achieve beŠer representation and can correctly re- Although precise understanding of mobile data trac distributions cover images from downsampled counterparts. More recently deep is essential for ISPs, research into accurate inference of €ne-grained learning advances led to new major improvements in image SR paŠerns is sparse, while the mobile trac super-resolution problem performance. In particular, Chao et al. design a Super-Resolution we tackle in this work is unexplored. Convolutional Neural Network (SRCNN) to learn the correlation between the low-/high-resolution images [14], while Generative Trac/Environment Measurement & Analysis: Naboulsi et Adversarial Network (GAN) architectures for accurate image SR are al. survey existing mobile trac analysis practices, emphasising introduced in [15, 25]. Experiments demonstrate these can aŠain that cellular trac monitoring relies on various probe types de- high mean opinion scores (MOS) [25] or high SSIM with large up- ployed across cities [5]. Œis includes RNC probes that record scaling factor [15], when performing image SR. To our knowledge, €ne-grained state changes at each mobile device, but introduce the ZipNet-GAN introduced in this paper is the €rst technique that substantial computational and storage costs; and PGW probes with tackles mobile trac SR to make accurate inferences of €ne-grained larger geographical coverage that monitor mobile trac aggre- trac consumption from coarse measurements. gates with less overhead, but only o‚er approximate positioning information. None of these are well suited to accurate inference 8 CONCLUSIONS of €ne-grained trac paŠerns. Mobile trac collected by such Precision large-scale mobile trac analytics is increasingly vital equipment has been analysed for di‚erent purposes. Furno et al. for operators, to provision network resources agilely and keep up jointly classify the network-wide mobile trac demand by spatial with end-user application requirements. Extracting €ne-grained and temporal dimensions, to serve network activity pro€ and mobile trac paŠerns is however complex and expensive, as this land usage detection tasks [4]. Wang et al. extract and model trac relies on specialised equipment, substantial storage capabilities, paŠerns in Shanghai to deliver insights into mobile trac demand and intensive post-processing. In this paper we proposed a mobile in large cities [3]. Œeir analysis is extended to forecast the trac trac super resolution (MTSR) technique that overcomes these consumed across a city, using Autoregressive Integrated Moving issues and infers mobile trac consumption with €ne granularity, Average (ARIMA) [2]. given only limited coarse-grained measurement collected by probes. Roughan et al. infer Internet trac matrices from data sets sub- Our solution consists of a Generative Adversarial Network (GAN) ject to severe measurement loss, using spatio-temporal compressive architecture combined with an original deep Zipper Network (Zip- sensing (CS), and demonstrate superior performance over other Net) inspired by image processing tools. Œe proposed ZipNet-GAN interpolation methods [37]. Compressive sensing has also been is tailored to the speci€cs of the mobile networking domain and employed to reconstruct sensory data collected in Wireless Sensor achieves reliable network-wide MTSR. Speci€cally, experiments we Networks, while exploiting spatial similarity and temporal stability conduct with a real-world mobile trac data set collected in Mi- of lossy/noisy measurements [38]. However, we recognise sev- , demonstrate the proposed schemes predict narrowly localised eral reasons for which CS cannot be directly adopted for MTSR trac consumption, while improving measurement granularity purposes. In particular, CS algorithms (i) require measurement by up to 100×, irrespective of the position and coverage of the matrices satisfying the Restricted Isometry Property, which may probes. Œe ZipNet(-GAN) reduce the prediction error (NRMSE) not be feasible with standard deterministic sampling constrained of existing interpolation and super-resolution approaches by 78%, by the types and geographical distribution of probes; (ii) will not and achieve up to 40% higher €delity (PSNR) and 36.4× greater work with the highly non-linear relationships between coarse- and structural similarity (SSIM). €ne-grained samples, as they expect linear relationships between sparse trac and inference matrices. ZipNet-GAN: Inferring Fine-grained Mobile Tra€ic Pa‚erns … CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea

Acknowledgements using a generative adversarial network,” To appear Proceedings of the IEEE Con- ference on Computer Vision and Paˆern Recognition, 2017. We thank Marco Fiore for his valuable suggestions that helped [26] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016, improving the quality of this paper. We also thank the anonymous hŠp://www.deeplearningbook.org. reviewers for their constructive feedback and we are grateful for [27] M. Arjovsky and L. BoŠ, “Towards principled methods for training generative adversarial networks,” in NIPS 2016 Workshop on Adversarial Training, vol. 2016. the guidance of our shepherd, Aruna Balasubramanian. [28] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations (ICLR), 2015. [29] G. Barlacchi, M. De Nadai, R. Larcher, A. Casella, C. Chitic, G. Torrisi, F. Antonelli, REFERENCES A. Vespignani, A. Pentland, and B. Lepri, “A multi-source dataset of urban life in [1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Trac Forecast the city of Milan and the province of Trentino,” Scienti€c data, vol. 2, 2015. Update, 2016-2021 White Paper,” Februrary 2017. [30] R. Carlson and F. Fritsch, “Monotone piecewise bicubic interpolation,” SIAM [2] F. , Y. , J. , D. , H. Shi, J. Song, and Y. , “Big data driven mobile journal on numerical analysis, vol. 22, no. 2, pp. 386–400, 1985. trac understanding and forecasting: A time series approach,” IEEE Transactions [31] J. , J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse on Services Computing, vol. 9, no. 5, pp. 796–805, Sept 2016. representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861– [3] H. Wang, F. Xu, Y. Li, P. Zhang, and D. , “Understanding mobile trac paŠerns 2873, 2010. of large scale cellular towers in urban environment,” in Proc. ACM IMC, 2015, pp. [32] R. Timo‰e, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood 225–238. regression for fast super-resolution,” in Asian Conference on Computer Vision. [4] A. Furno, M. Fiore, and R. Stanica, “Joint spatial and temporal classi€cation of Springer, 2014, pp. 111–126. mobile trac demands,” in Proc. IEEE INFOCOM, 2017. [33] “TensorLayer library,” accessed Jan 2017. [Online]. Available: hŠps://tensorlayer. [5] D. Naboulsi, M. Fiore, S. Ribot, and R. Stanica, “Large-scale mobile trac analysis: readthedocs.io/en/latest/ a survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 124–161, [34] T. and R. R. Draxler, “Root mean square error (RMSE) or mean absolute 2016. error (MAE)?–Arguments against avoiding RMSE in the literature,” Geoscienti€c [6] Q. Xu, A. Gerber, Z. M. Mao, and J. , “Acculoc: Practical localization of Model Development, vol. 7, no. 3, pp. 1247–1250, 2014. performance measurements in 3G networks,” in Proc. ACM MobiSys, Bethesda, [35] A. Hore and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in Proc. IEEE Maryland, USA, 2011. International Conference on Paˆern Recognition (ICPR), 2010, pp. 2366–2369. [7] I. M. Averin, V. T. Ermolayev, and A. G. Flaksman, “Locating mobile users using [36] J. Li, X. Chen, E. Hovy, and D. Jurafsky, “Visualizing and understanding neural base stations of cellular networks,” Communications and Network, vol. 2, no. 04, models in nlp,” in Proceedings of NAACL-HLT, 2016, pp. 681–691. p. 216, 2010. [37] M. Roughan, Y. Zhang, W. Willinger, and L. Qiu, “Spatio-temporal compressive [8] D. Lee, S. Zhou, X. Zhong, Z. , X. Zhou, and H. Zhang, “Spatial modeling of sensing and internet trac matrices,” IEEE/ACM Transactions on Networking the trac density in cellular networks,” IEEE Wireless Communications, vol. 21, (ToN), vol. 20, no. 3, pp. 662–676, 2012. no. 1, pp. 80–88, 2014. [38] L. Kong, M. Xia, X.-Y. Liu, G. Chen, Y. Gu, M.-Y. Wu, and X. Liu, “Data loss and [9] X. , L. Liu, and H. Ma, “Enhance the quality of crowdsensing for €ne-grained reconstruction in wireless sensor networks,” IEEE Transactions on Parallel and urban environment monitoring via data correlation,” Sensors, vol. 17, no. 1, p. 88, Distributed Systems, vol. 25, no. 11, pp. 2818–2828, 2014. 2017. [39] L. Liu, W. Wei, D. Zhao, and H. Ma, “Urban resolution: New metric for measuring [10] M. Gramaglia and M. Fiore, “Hiding mobile trac €ngerprints with glove,” in the quality of urban sensing,” IEEE Transactions on Mobile Computing, vol. 14, Proc. ACM CoNEXT, Heidelberg, Germany, 2015. no. 12, pp. 2560–2575, 2015. [11] J. D. Simpkins and R. L. Stevenson, “An introduction to super-resolution imaging,” [40] C. E. Duchon, “Lanczos €ltering in one and two dimensions,” Journal of Applied Mathematical Optics, vol. 16, pp. 555–578, 2012. Meteorology, vol. 18, no. 8, pp. 1016–1022, 1979. [12] M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous [41] R. Timo‰e, V. De Smet, and L. Van Gool, “Anchored neighborhood regression systems,” 2015, so‰ware available from tensorƒow.org. [Online]. Available: for fast example-based super-resolution,” in Proc. IEEE International Conference hŠp://tensorƒow.org/ on Computer Vision, 2013, pp. 1920–1927. [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classi€cation with deep convolutional neural networks,” in Proc. NIPS, 2012. [14] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on paˆern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2016. [15] X. and F. Porikli, “Ultra-resolving face images by discriminative generative networks,” in European Conference on Computer Vision. Springer, 2016, pp. 318–333. [16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Paˆern Recognition, 2016, pp. 770–778. [17] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680. [18] H. Noh, S. Hong, and B. , “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1520–1528. [19] S. , W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Trans. Paˆern Analysis and Machine Intel., vol. 35, no. 1, pp. 221–231, 2013. [20] S. Io‚e and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shi‰,” in International Conference on Machine Learning, 2015, pp. 448–456. [21] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Recti€er nonlinearities improve neural network acoustic models,” in Proc. ICML, vol. 30, no. 1, 2013. [22] A. Veit, M. J. Wilber, and S. Belongie, “Residual networks behave like ensembles of relatively shallow networks,” in Advances in Neural Information Processing Systems, 2016, pp. 550–558. [23] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely connected convolutional networks,” To appear Proceedings of the IEEE Conference on Com- puter Vision and Paˆern Recognition, 2017. [24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” International Conference on Learning Representations (ICLR), 2015. [25] C. Ledig, L. Œeis, F. Huszar,´ J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution