Optimization of Proactive Services with Uncertain Predictions

A Dissertation Presented

by

Ran Liu

to

The Department of Electrical and Computer Engineering

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in

Electrical and Computer Engineering

Northeastern University

Boston, Massachusetts

October 2020 Contents

List of Figures iv

List of Tables vi

List of Acronyms vii

Acknowledgments viii

Abstract of the Dissertation ix

1 Introduction 1 1.1 Background ...... 1 1.2 Proactive Caching and Proactive Computing Technologies ...... 3 1.2.1 Proactive Caching ...... 3 1.2.2 Proactive Computing ...... 7 1.3 Related Work ...... 9

2 Proactive Caching under Uncertain Predictions 12 2.1 Introduction ...... 12 2.2 System Model ...... 14 2.2.1 Network Model ...... 14 2.2.2 Service Model ...... 16 2.2.3 Problem Formulation ...... 19 2.3 Relation between Reactive Scheme and Proactive Schemes ...... 21 2.4 Threshold-based Proactive Strategy and Markov Chain ...... 26 2.4.1 Threshold-Based Proactive Strategies ...... 26 φ 2.4.2 Markov Chain of System under ΨP ...... 26 2.5 Delay Comparison between UNIFORM and EDF strategies ...... 37 2.6 Numerical Evaluation ...... 39 2.6.1 Infinite Prediction Window Scenarios ...... 40 2.6.2 Finite Prediction Window Scenarios ...... 43 2.7 Summary ...... 46

i 3 Delay-Optimal Proactive Strategy under Uncertain Predictions 48 3.1 Introduction ...... 48 3.2 System Model ...... 50 3.2.1 Network Model ...... 50 3.3 System Model ...... 52 3.3.1 Server Model ...... 52 3.3.2 Service Model ...... 54 3.3.3 Problem Formulation ...... 57 3.4 Relation between Reactive Scheme and Proactive Schemes ...... 60 3.5 Fixed-Probability (FIXP) Strategy in the Genie-Aided Proactive System ...... 64 3.5.1 Markov Process of the Genie-Aided System under FIXP(φ) ...... 65 3.5.2 Recurrence Analysis of the Embedded Markov Chain ...... 69 3.5.3 Relationship between FIXP(φ) and the two Properties ...... 73 3.5.4 Delay Analysis of FIXP(φ) Strategies under the Genie-Aided System . . . 74 3.6 Fixed-Probability (FIXP) Strategy in Realistic Proactive System ...... 75 3.6.1 The Stochastic Process in the Realistic Proactive System under FIXP(φ) . . 75 3.6.2 Analysis on the Approximated Time-invariant Markov Chain ...... 80 3.6.3 Analysis on the Embedded Chain of the Realistic Proactive System . . . . 82 3.6.4 FIXP Strategies and the Two Properties in Realistic Proactive System . . . 83 3.6.5 Delay of FIXP strategies in the Realistic Proactive System ...... 84 3.7 Numerical Evaluation ...... 84 3.8 Summary ...... 88

4 SANDIE: SDN-Assisted NDN for Data-Intensive Experiments 90 4.1 Introduction ...... 92 4.1.1 Large-Scaled Data-Intensive Applications ...... 92 4.1.2 Named-Data Networking (NDN) Architecture ...... 94 4.1.3 VIP: Optimized NDN Caching, Forwarding, and Congestion Control . . . 96 4.2 Data Analysis: LHC network and CMS Workflow ...... 97 4.2.1 US CMS Network and Workflow ...... 97 4.2.2 Analysis on CMS Data and Workflow ...... 101 4.3 Experimental Evaluations ...... 110 4.3.1 Simulations of the VIP Framework in the Internet2 Topology ...... 111 4.3.2 Simulations of the VIP Framework with Off-site CMS Workflow ...... 114 4.4 Modification of VIP Algorithms for SANDIE ...... 116 4.4.1 Modified Virtual Interest Packet ...... 117 4.4.2 Modifications on VIP Caching Strategy ...... 118 4.5 SANDIE Testbed and Implementations ...... 123 4.5.1 SANDIE Testbed Configuration ...... 123 4.5.2 Implementation of VIP Framework in Existing NDN Forwarders ...... 125 4.5.3 Demonstration at SuperComputing 19’ ...... 129 4.5.4 An Overview of the Latest Progress on the SANDIE Project ...... 130 4.6 Summary ...... 131

Bibliography 133

ii A Appendix of Chapter 2 144 A.1 Proof of Proposition 1 ...... 144 A.2 Proof of Theorem 1 ...... 145 A.3 Proof of Corollary 1 ...... 148 A.4 Proof of Proposition 2 ...... 150 A.5 Proof of Proposition 3 ...... 152 A.6 Proof of Proposition 4 ...... 155 A.7 Proof of Lemma 1 ...... 159 A.8 Proof of Theorem 3 ...... 162 A.9 Proof of Theorem 4 ...... 164 A.10 Proof of Corollary 3 ...... 179 A.11 Proof of Corollary 4 ...... 180 A.12 Proof of Corollary 5 ...... 182

B Appendix of Chapter 3 185 B.1 Relationship between the Genie-Aided system and Realistic Proactive system . . . 185 B.2 Proof of Theorem 5 ...... 186 B.3 Proof of Corollary 6 ...... 190 B.4 Proof of Lemma 2 ...... 191 B.5 Proof of Proposition 8 ...... 191 B.6 Verification of Theorem 5 under FIXP strategies in Genie-Aided system ...... 193 B.7 Proof of Proposition 7 ...... 195 B.8 Proof of Theorem 7 ...... 198 B.9 Proof of Theorem 8 ...... 200 B.10 Proof of Proposition 10 ...... 205 B.11 Verification of Theorem 5 under FIXP strategies in Realistic Proactive system . . . 206 B.12 Proof of Theorem 10 ...... 208

iii List of Figures

2.1 Network Model ...... 15 2.2 Arrival Processes ...... 15 2.3 Comparison between the Reactive Scheme and the Proactive Scheme ...... 19 φ s 2.4 Example: Transitions in the proactive system with ΨP , with φ = 2 ...... 28 2.5 Comparisons among threshold-based methods: λp = 6 ...... 41 2.6 Comparisons among threshold-based methods: λp = 9.6 ...... 41 2.7 Comparisons among EDF,UNIFORM and Reactive Schemes: λp = 6 ...... 42 2.8 Comparisons among EDF,UNIFORM and Reactive Schemes: λp = 9.6 ...... 42 2.9 Queue Size Evolution Comparisons among Reactive Scheme, EDF strategy and UNIFORM strategy: λp = 6 ...... 43 2.10 Queue Size Evolution Comparisons among Reactive Scheme, EDF strategy and UNIFORM strategy: λp = 9.6 ...... 43 2.11 Theoretical Delay Comparison between UNIFORM Strategy and Reactive Scheme 43 2.12 Comparisons among EDF,UNIFORM and Modified-UNIFORM: λp = 6 ...... 44 2.13 Comparisons among EDF,UNIFORM and Modified-UNIFORM: λp = 9.6 . . . . 44

3.1 Server Model ...... 52 3.2 Arrival Processes ...... 52 3.3 Comparisons between reactive and proactive systems ...... 61 3.4 Embedded Markov Chain of the Markov Process under FIXP Strategies in the Genie-Aided System ...... 69 3.5 Corresponding 1D Markov Chain Xk ...... 72 3.6 The transitions of the Realistic Proactive{ } System. The corresponding time-invariant transition rates are shown in brackets...... 79 3.7 Limiting Fraction of Proactive Services of FIXP Strategies: λp = 6 ...... 86 3.8 Limiting Fraction of Proactive Services of FIXP Strategies: λp = 9.6 ...... 86 3.9 Average Delay of FIXP Strategies: λp = 6 ...... 87 3.10 Average Delay of FIXP Strategies: λp = 9.6 ...... 87

4.1 On-Site Jobs and Off-Site Jobs in CMS Network ...... 99 4.2 Distribution of Datablock Size in CMS Workflow of US sites ...... 103 4.3 Distribution of Datablock Size in CMS Workflow of Caltech Site ...... 104

iv 4.4 Percentage of Requests Covered by Popular Datablocks: US Sites ...... 105 4.5 Percentage of Requests Covered by Popular Datablocks: Caltech Site ...... 106 4.6 Request Counts: Total vs. Popular Datablocks at US Tier-2 Sites ...... 106 4.7 Zipf Distribution Approximations for Popularity Distributions at US Tier-2 Sites . 107 4.8 File-level Popularity of an Example Datablock ...... 108 4.9 The Early Internet2 Topology ...... 111 4.10 Internet2 CMS Topology for Simulation Study ...... 112 4.11 Delay Performance of VIP Framework in Internet2 Topology ...... 113 4.12 Average Connection Speed from MIT CMS Site ...... 115 4.13 Delay Performance of Off-site Workflow in CMS Network ...... 116 4.14 Modified VIP Caching Structure ...... 122 4.15 VLAN Configurations of SANDIE Testbed ...... 125

A.1 Comparison of System 1 and System 2 in the Proof of Proposition 4 ...... 156 A.2 Comparison of the Proactive System and the Virtual System in the Proof of Proposi- tion 4 ...... 156

v List of Tables

2.1 Table of Notation ...... 22

3.1 Table of Notation ...... 58

vi List of Acronyms

NDN Named Data Networking. An information-centric network architecture where content names are utilized for network functionality instead of traditional address as in IP protocols.

EDF Earliest Deadline First. A proactive strategy where the server always works on future requests with the earliest arrival time predicted.

FIXP Fixed Probability strategies. A family of proactive strategies where each predicted request is selected to receive proactive service with a fixed probability independently.

SANDIE SDN-Aided NDN for Data-Intensive Experiments.

CERN The European Organization for Nuclear Research.

LHC Large Hadron Collider. The world’s largest and most powerful particle accelerator at CERN.

CMS Compact Muon Solenoid. One of the general-purpose particle physics detectors built on the LHC at CERN.

VIP 1) the VIP framework for joint caching, forwarding and congestion control algorithms in NDN. 2)Virtual Interest Packets.

HEP High Energy Physics.

NFD NDN Forwarding Daemon.

vii Acknowledgments

Towards the completion of my Ph.D. career, I would like to express my sincere gratitude to the people who guided, motivated, and supported me throughout the exciting journey. First, I want to thank my advisor Prof. Edmund Yeh. He has provided generous patience and tireless guidance to me throughout my years at Northeastern University. I feel so fortunate and proud to be a student of Prof. Edmund Yeh. His great personality, profound knowledge, and meticulous attitude will always be a role model for me as a scholar. I would like to thank my committee members, Prof. Stratis Ioannidis and Prof. Tommaso Melodia, for their guidance and encouragement during my Ph.D. career. I also greatly appreciate the efforts and contributions from Prof. Atilla Eryilmaz and the SANDIE team to specific chapters of my dissertation. I want to thank my parents, who never stop their faith in me. Whatever difficulties I encounter, they are always there for me. Last but not least, great thanks to Milad Mahdian, Ying Cui, Khashayar Kamran, Jinkun Zhang, Derya Malak, Qian Ma, Yuanhao Wu, Yuezhou Liu, and Faruk Volkan Mutlu, for being my best colleagues and friends.

viii Abstract of the Dissertation

Optimization of Proactive Services with Uncertain Predictions

by Ran Liu Doctor of Philosophy in Electrical and Computer Engineering Northeastern University, October 2020 Prof. Edmund Yeh, Advisor

The Internet faces significant challenges from the dramatic growth in traffic and com- putation workload from highly diverse applications. With the evolution of technologies such as machine learning and data science, proactive services with the aid of predictive information have been recognized as a promising method to exploit network bandwidth, storage, and computation resources to achieve improved user experiences. To better understand the challenges facing the Internet, we introduce the background in Chapter 1, followed by a thorough survey of the existing literature on prediction algorithms, proactive caching, and proactive computing. Specifically, we discuss the analytical works of optimization problems in proactive algorithms, which are closely related to our work. Our primary goal is to investigate the fundamental performance improvement that can be achieved from proactive services under uncertain predictions. We aim to analyze the queueing behavior of a proactive system and design proactive strategies to optimize system performance in terms of the limiting fraction of proactive work and the average delay. In Chapter 2, we study a proactive caching system where files can be served partially under uncertain predictions. We first propose a potential request process where each request is realized with a certain probability, characterizing the uncertainties in predictions. In a general proactive system, we derive an upper bound for the average amount of proactive service per request that the system can support. Then we analyze the behavior of a family of threshold-based proactive strategies and show that the average amount of proactive service per request can be maximized by properly selecting the threshold. Finally, we propose the UNIFORM strategy, which is the optimal threshold-based strategy, and show that it outperforms the most commonly used Earliest-Deadline-First (EDF) type proactive strategies in terms of delay. We perform extensive numerical experiments to demonstrate the influence of thresholds on delay performance under the threshold-based strategies and specifically compare the EDF strategy and the UNIFORM strategy to verify our results.

ix In Chapter 3, we study a more generalized proactive service problem under uncertain predictions. We propose a more generalized service model where service time follows an exponential distribution, where services cannot be partially finished. Similarly, we derive an upper bound for the fraction of services that can be finished proactively under uncertain predictions in a general proactive service system. Specifically, we analyze a family of fixed-probability (FIXP) proactive strategies in two proactive systems: the Genie-Aided system and the Realistic Proactive system. We obtain optimal FIXP strategies in both systems and prove that the optimal FIXP strategies maximize the limiting fraction of proactive service among all proactive strategies and minimize average delay among FIXP strategies. Extensive numerical experiments demonstrate the influence of the parameter of FIXP on the performance of limiting fraction of proactive work and the average delay in both proactive systems and verify our theoretical results in multiple scenarios. As a complementary chapter, we introduce our work on the SDN-Aided NDN for Data- Intensive Experiments in Chapter 4. This project aims to apply the novel Named-Data Networking architecture as a networking solution for data-intensive scientific applications. We first introduce data-intensive high-energy physics applications of CMS at CERN, the Named-Data Networking architecture, the VIP joint caching and forwarding framework, and the overview of the SANDIE project as the background. We then elaborate on our designs and achievements in this project following the chronological order. We first study the CMS experimental data formats and workflow from several data analysis systems in CERN to understand CMS traffic patterns. We then demonstrate results of our packet-level network simulations of the VIP framework on the CMS network topology: one at an early stage of this project to explore the potential beneficial caching locations, and the other updated one based on our analysis of CMS datasets and workflows to verify the potential system performance improvement from VIP algorithms. To accommodate the specific patterns in the CMS network, we modified our VIP framework to address the challenges of CMS applications. In the last section, we introduce the deployment of the continental SANDIE testbed, implementations of the VIP framework in two NDN forwarder software, a milestone demonstration at SuperComputing 19’, and an overview of the latest progress and future directions on the SANDIE project.

x Chapter 1

Introduction

1.1 Background

Driven by the evolution of IT technologies, networks achieve better coverage, higher bandwidth, and more powerful computation capabilities. Meanwhile, resource allocation problems become more significant due to the dramatic growth in traffic load, number of users, number of devices, and types of applications. On the one hand, bandwidth resource is still one of the most crucial resources in the networks, especially for real-time interactive applications. Applications such as video, gaming, virtual reality (VR), and augmented reality (AR) have become a dominant network traffic source. Based on reports from Cisco [1, 2], video traffic takes up 75% of all the IP traffic and is predicted to be 82% in 2022; gaming traffic is forecasted to grow 10-fold from 2016 to 2021; VR and AR traffic will increase 12-fold from 2017 to 2022; Internet surveillance traffic will increase sevenfold by 2022. Video with higher resolution is becoming more and more popular, as 80% of the video traffic will be UHD and HD video by 2022, and 62% of the TV panels will support 4K by 2022.

On the other hand, there is a dramatic increase in the number of applications with strict performance

1 CHAPTER 1. INTRODUCTION requirements in a few fields which have become popular in recent years. First, Internet-of-Things

(IoT) applications such as healthcare monitoring, smart cars, smart home, and surveillance become a principal factor that drives the evolution of M2M communications. According to Cisco annual report [3], M2M devices will be 31% of total global mobile devices, and M2M connections will be 50% of all mobile connections by 2023. Second, fog computing, or mobile edge computing, is becoming a more popular research topic recently. One of the most crucial driving factors for fog computing is that the servers with abundant computational and storage resources are placed closer to service requests, thus achieving lower delivery latency. However, with more and more computation- intensive applications, the computation and storage resources at distributed servers become limited, as indicated in [4]. Last but not least, network automation is forecasted to be the most influential IT technology in the next five years, according to Cisco [5], where pervasive computation algorithms are one of the major deciding factors for the performance of network automation in business. All the applications mentioned above have an exceptionally high demand for low latency, which significantly impacts the revenue of companies such as Amazon and Google [6]. As a result, how to improve the delay performance of such applications is a crucial challenge in the networking area.

Proactive algorithms are proposed as a promising solution to meet the challenges. The feasibility of proactive algorithms is from 1) request patterns of the applications mentioned above have certain predictability, 2) the evolution of AI techniques significantly improve the reliability of prediction algorithms, and 3) the fact that the Internet experiences daily workload fluctuation patterns, which motivates a proactive algorithm to utilize the off-peak resources for offloading of peak-hour tasks. Proactive caching and Proactive computing are two major domains for proactive algorithms, which will be discussed in the following.

2 CHAPTER 1. INTRODUCTION

1.2 Proactive Caching and Proactive Computing Technologies

1.2.1 Proactive Caching

Distributed caching techniques have been a mature and widely applied technology in the networks. Caching devices can reduce a considerable amount of traffic by caching data objects locally, meanwhile reducing latency and network resource consumption. There has been extensive work in this area, such as [7],[8],[9]. On top of traditional reactive caching, proactive caching technologies emerge as a promising method to improve system performance, utilizing predictive information to move potential target contents closer to the consumers using underutilized bandwidth resources before they are requested. There have been some pioneering work envisioning proactive caching as a solution to enhancement of network performance. In [10] and [11], the authors investigate the benefits of proactive caching in 5G wireless networking scenarios. They point out the challenges facing 5G wireless networks, including a high volume of data traffic and the demand for low latency, which addresses the need for deploying proactive caching algorithms in edge networks. Two large-scale experimental cases are carried out to show the promising improvement in network performance achieved by proactive caching in 5G wireless networks. Famaey et al. [12] address the importance of applying proactive caching in Video-on-Demand networks by proposing a predictive method based on user trace files and studied the theoretical gain achieved from proactive caching comparing with some global optimal caching algorithm which only utilizes history information. These early work demonstrate the great potential of performance enhancement by applying proactive caching in specific delay-sensitive applications.

Prediction Algorithms: In the general context of proactive caching, prediction algorithms on future request patterns can be considered as a foundation for proactive caching algorithms. The

3 CHAPTER 1. INTRODUCTION fundamental problems of prediction algorithms include 1) what types of requests can be predicted, i.e., the predictability of requests, 2) which patterns can be treated as inputs for prediction algorithms, 3) what is the temporal evolution pattern of popularity distributions, and 4) how to generate predictions based on the information above. A considerable amount of effort has been invested in this topic. For example, Pinto et al. [13] design two models to utilize early popularity patterns after the publication of data content for predicting future demand. The authors carry out experiments with a widely-used

Youtube video portal and show that the prediction precision is exceedingly improved, especially during peak hours comparing with a state-of-art baseline algorithm. Ahmed et al. [14] address the importance of applying prediction models for different types of traffic and design a prediction algorithm that can classify distinct types of data based on patterns of temporal evolution and apply different prediction models on them. They show that they could accurately classify content and precisely predict such content’s future popularity with experiments on data from several popular data platforms. Li et al. [15] propose the relationship between the Video-on-Demand systems and social networks and pointed out that the traditional view-based popularity prediction algorithms are not efficient with an extensive measurement in China’s largest social network. The authors design a propagation-based prediction algorithm that considers propagation patterns in social networks as the metrics to determine popularity trends. The improvement in predicting bursts and peaks is verified through a trace-driven experiment. Zhang et al. [16] propose a chuck-level prediction model that depends on user behaviors and relationships among chunks in a video and design a corresponding algorithm in Information-Centric Networks. Based on predictive information produced by prediction algorithms, caching strategies can be designed to utilize such information to benefit future requests efficiently. Based on the aforementioned work, the most common inputs to the prediction algorithms

4 CHAPTER 1. INTRODUCTION are usually 1) early patterns of different contents, 2) types of data where the popularity have different temporal evolution patterns, 3) mobility, 4) spreading patterns in networks, especially social networks, and other factors which generally characterize the future demand patterns.

Proactive Caching: Depending on the predictive information, proactive caching algo- rithms are designed and applied for different types of applications and in different networks. First of all, proactive caching based on mobility predictions is one primary direction. Siris et al. [17] propose a proactive content fetching algorithm that incorporated proactive caching with mobility prediction information. The algorithm predicts users’ mobility and proactively cache the requested content at the target location by minimizing a congestion price. Similarly, Abani et al. [18] also study the problem of proactive caching with mobility predictions. The main contribution is to reduce redundant copies by characterizing mobility uncertainties with entropy to locate the best prefetching node. Lan et al. [19] propose an optimal data caching problem with the constraints on the nodes’ mobility, capacities and files’ sizes, which is proven to be NP-hard. Two algorithms are designed, including one with the aid of base stations and the other fully distributed. Grewe et al. [20] introduce a proactive caching strategy for Vehicular ad-hoc networks (VANETs) in Named Data Networking

(NDN), which jointly considers the mobility of vehicles and proactive content placements.

Second, wireless networks are also an area where proactive caching can achieve significant improvement due to the demand for reliable and low-latency services. Bastug et al. [21] apply proactive caching in two scenarios. Firstly, they consider a wireless access network scenario, where contents are proactively cached during off-peak demands based on file popularity and correlations among users-files patterns. Secondly, they predict a set of influential users and make them proactively cache and disseminate content. Muller et al. [22] propose an online learning algorithm that observes

5 CHAPTER 1. INTRODUCTION the content popularity of connected users, updates the cached content, and observes cache hits subsequently for services with different priorities in wireless networks. The authors prove that their algorithm converges to maximizing cache hits. Zhou et al. [23] focus on energy-efficient algorithms and designs an efficient, proactive caching algorithm for multimedia contents at energy harvesting based small cells. The authors develop a novel joint model that takes content popularity and energy into consideration in both temporal and spatial senses, with which they design an algorithm that achieves fewer base state activities. Yi et al. [24] formulate a welfare maximization problem for proactive caching in a social-aware D2D network. They applied the basis transformation method to solve the joint power control, channel allocation, link , and reward design. Xu et al. [25] apply proactive caching concepts to solve the endurance problems faced by unmanned aerial vehicles.

The algorithm first selects a set of ground nodes as the caching location for a set of predicted contents, and then the data transmissions are done among the requesting ground nodes and the caching nodes when the actual requests are generated.

With the evolution of machine learning techniques, numerous AI-based proactive caching algorithms are designed. Ale et al. [26] design a proactive caching algorithm for time-series content requests in edge caching using a bidirectional deep recurrent neural network. Zhang et al. [27] propose a proactive caching algorithm for multi-view 3D video streaming in smart cars to enhance delay performance under the circumstances of small cellular coverage and large data volume. Jing et al. [28] propose a novel low-rank multi-view embedding learning framework that improves the performance of popularity prediction for micro-video applications. Liu et al. [29] design a distributed deep learning network by utilizing distributed computation resources in an Information-Centric

Network. The machines with deep-learning-based algorithms are deployed to collect location-based

6 CHAPTER 1. INTRODUCTION information as input to the prediction algorithm, and a light-weighted caching algorithm uses the predictions for proactive caching decisions. Chen et al. [30] propose a distributed light-weighted echo state networks (ESN) with sublinear algorithms for proactive caching in cloud radio access networks. The proposed ESNs take advantage of periodic input with limited information on system and user states to predict content requests and mobility patterns. Hou et al. [31] propose to apply proactive caching and deep learning techniques in roadside units for road security. They design a heuristic Q-learning solution that solves the problems of limited cache resources and high latency from learning processes. Hou et al. [32] propose a learning-based cooperative caching with the objectives of minimization of transmission costs and improvement of caching performance in mobile edge computing environments. To solve the NP-hard problem, the authors propose a transfer learning approach with a K-means clustering algorithm based on access patterns. Somuyiwa et al. [33] consider a scenario where multiple users with limited caching capabilities access dynamic contents in wireless networks. They model the problem as a Markov decision process and formulate an energy-cost minimization problem. A threshold-based proactive caching scheme is proved to be optimal, where the optimal threshold is achieved by applying reinforcement learning techniques.

1.2.2 Proactive Computing

Another promising solution is proactive computing [34], where instead of traditional interaction with users, computation tasks are mostly automated based on anticipatory demand.

Engel et al. [35] also address the demand for proactive computing and proposed high-level system architecture for event-driven proactive computing, including a prediction agent and an automated decision-making agent. The idea of proactive caching was proposed with more concentration on automation, but the idea is also applied to the problem of how to proactively finish heavy

7 CHAPTER 1. INTRODUCTION computation tasks to enhance service performance recently. Elbamby et al. [36] design a VR application in mmWave wireless environment, where servers proactively compute HD video frames to decrease computing latency and proactively distribute them to edge caches to decrease delivery delay. Bousdekis et al. [37] propose to apply proactive event-driven computing for enabling proactive maintenance decisions in industry business. Zhou et al. [38] address the problems of limited computation resources with the emergence of more computation-demanding applications.

A proactive computation offloading algorithm is designed in mobile edge computing servers to pre-cache some tasks to minimized delay with also the considerations on energy-consumption. Cui et al. [39] design an application of proactive computing in a data-intensive application: analysis and visualization of genomic data. The computational tasks in the application include visualization of large datasets and analysis over various subsets of a dataset. A proactive computation algorithm,

Epiviz, was designed, which improved the system performance. Geerhart et al. [40] compare the latency performance of both proactive and reactive offloading decisions and is able to observe a considerably reduced latency in a realistic wi-fi environment. There are other efforts in the domain of proactive techniques, which we are not going to list due to space limitations.

Both the proactive caching and proactive computing algorithms take advantage of predictive information to meet user demand proactively by exploiting unbalanced temporal or spacial network resources. All the works mentioned above achieve significant performance improvement by applying proactive techniques to accommodate requests. However, these works do not reveal the fundamental insights on how much improvement in system performance we can expect by utilizing prediction information.

8 CHAPTER 1. INTRODUCTION

1.3 Related Work

Several works explore the fundamental improvement of network performance by applying proactive scheduling in specific networking problems. Hu et al. [41] apply game theory on the conflict of interests among different parties, including the service providers, the mobile network operators, and mobile users. The authors look into both centralized and distributed network scenarios and analyze two models in each scenario. Zheng et al. [42] study the proactive edge caching problems in 5G networks by looking at the game between service provides and end-users. A Stackelberg game is designed with two sub-games, including a storage allocation game (SAG) and a user allocation game (UAG), with the optimum achieved. Rao et al. [43] propose a proactive caching problem in a Peer-to-Peer network to solve the problem caused by heavily-tailed popularity distribution.

They design an algorithm that decides the caching placement and the number of replicas, which jointly optimizes the number of look-up hops and the number of replicas. Hou et al. [44] study a

QoE maximization problem in a proactive two-phase wireless VR application and demonstrated its effectiveness in a gaming system. Elbameby et al. [45] analyze the problem of placement of proactive computing and caching tasks in fog networks with reliability and delay constraints. Besides a general prediction model, the uncertainties in predictions are characterized in the optimization problems in the following works. In [46], Tadrous et al. characterize the diversity and multicasting gains of proactive caching using large-deviations theory under the assumption of perfect predictions.

Muppirisetty et al. [47] and Tadrous et al. [48] study a cost optimization problem in a multi-user single-server system with proactive scheduling. The authors proposed a model with uncertainties in user demands and channel states and designed a proactive scheduling algorithm, which was proved to be asymptotically optimal in cost. Alotaibi et al. [49] consider a profit maximization problem for

9 CHAPTER 1. INTRODUCTION a carrier and a cost minimization problem for users with predictive information of user demands.

Another series of works concentrate on the fundamental queueing analysis of proactive systems, which are significantly related to our work. Huang et al. [50] study the delay performance of a backpressure algorithm in a downlink system with perfect predictions, where the requested objects and corresponding request epochs are accurately predicted. The authors prove that the average queueing delay asymptotically goes to 0 as the prediction window size goes to infinity. They also analyze the impact of prediction window size on the delay performance. Following this work, Zhang et al. [51] study the fundamental queueing performance of a single queue proactive system. They analyze a variety of scenarios with different arrival and service processes, different prediction window sizes, and different types of imperfect predictions. They show that proactive services exponentially reduce delay, especially in a lightly-loaded case. A related work by Chen et al. [52] designs and analyzes a predictive scheduling algorithm that maximizes the timely-throughput, which is the total traffic received before the deadlines.

The work of [47, 48, 50, 51] has a substantial impact on our method of modeling uncertain- ties of predictions and proactive services. The works above reveal that the utilization of predictive information for proactive scheduling techniques can fundamentally improve specific network per- formance metrics. However, the problems of the maximal fraction of work that can be completed proactively and the minimal average delay that can be achieved under uncertain predictions are still unsolved. We carry out series of analytical study on these fundamental problems in the proactive system under uncertain predictions.

The work in Chapter 2 analyzes the queueing dynamics of a proactive caching system under uncertain predictions. A tight upper bound is derived for the maximum average amount of

10 CHAPTER 1. INTRODUCTION proactive work completed per request by the system. A threshold-based strategy is proposed and proved to achieve the upper bound. The optimal threshold-based strategy is proved to outperform the most commonly used Earliest-Deadline-First proactive strategy in terms of delay. In contrast to the work in Chapter 2, the work in Chapter 3 has the following differences: 1) instead of the proactive caching model with identical content item size and constant link rate, we propose a more general proactive service model where service time follows an IID exponential distribution, 2) instead of allowing partial proactive service, we propose a more realistic fixed-probability (FIXP) strategy where each service is either proactively served or not, 3) we perform a comprehensive analysis of the

FIXP strategy and achieve the limiting distributions of the proactive service system in closed-form expressions of system parameters, and 4) we are able to completely characterize the impact of the parameter of FIXP strategy on the average delay in closed-form expressions and achieve the optimal delay. We will show our analysis in detail in Chapter 2 and Chapter 3.

11 Chapter 2

Proactive Caching under Uncertain

Predictions

2.1 Introduction

In this chapter, we aim to study the characteristics of proactive caching based on uncertain predictive information from a fundamental perspective. Different from the work of [51], we not only look at the basic queueing dynamics of the proactive system but also further explore how to strategically utilize uncertain predictions to enhance delay performance. In terms of delay performance, we take the Earliest-Deadline-First (EDF) type strategy, which has been widely used in network scheduling problems, as a competitive baseline in our analysis. There have been many work (e.g., [53],[54] and [55]) which studied the delay performance of the EDF strategy. In the proactive caching context, we consider the ‘deadlines’ to be the predicted arrival epochs. The authors of [50] has proved that the EDF strategy achieves optimal delay performance under perfect

12 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS predictions.

The main contributions and the structure of this chapter are listed as follows:

We propose a request model which characterizes the request uncertainty by introducing a potential • request process. We aim to maximize the average amount of proactive service for each request.

We introduce our system model and problem formulation in Section 2.2.

Based on the request-model with uncertainty, we reveal the iterative nature of bandwidth resource • assignment between reactive service and proactive service, by comparing the EDF strategy with a

First-Come-First-Serve reactive strategy as an example. As a result, we derive an upper bound on

how much proactive service per request that the system can support. We discuss the comparisons

and derive the bounds in Section 2.3.

For the purpose of analysis, we define a family of threshold-based proactive strategies, where the • threshold determines the maximal amount of proactive service to be done for each future potential

request. We construct a Markov chain to analyze the asymptotic behaviors of the proactive

system under the threshold-based strategies. We prove that the UNIFORM strategy, which is the

threshold-based strategy with the optimal threshold, is the solution to the optimization problem

we proposed. We obtain an important insight on how to design an optimal proactive strategy: the

strategy should balance proactive service among the predictions in nearer future and farther future

based on prediction uncertainties. We present the threshold-based strategies, the corresponding

Markov chain, and the corresponding analysis in Section 2.4.

We analytically compare the delay performance of the EDF type strategy with the UNIFORM • strategy. Although one would intuitively expect the EDF strategy to achieve desirable delay

13 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

performance based on its performance in previous network scenarios, we prove that the delay

performance of the EDF strategy is always worse than the UNIFORM strategy in all the non-trivial

cases. We show the analysis in Section 2.5.

We conduct extensive range of experiments to show the delay performance of the threshold- • based strategies with different thresholds. Specifically, we compare the delay performance of

the UNIFORM strategy with the EDF strategy in multiple network scenarios, with the reactive

scheme as a baseline. The results show that proactive caching not only greatly improves delay

performance in lightly-loaded cases as concluded in [51], but also works exceedingly well in the

heavily-loaded scenario with the UNIFORM strategy. We also carry out experiments to show

the impact of prediction window size on the delay performance for practicality concerns. The

UNIFORM strategy still shows excellent delay performance with simple modifications. We show

the numerical results in Section 2.6.

2.2 System Model

2.2.1 Network Model

We consider a system with one server providing delay-sensitive services to the user, as shown in Figure 2.1. The system operates in continuous time from time 0. The user receives service from the server at a constant rate of µ bits/sec.

Request Processes: Requests arrive at the server according to the processes shown in

Figure 2.2. The requests request same-sized data objects of s bits. The Potential Request Process is a

Poisson Process P (t); t > 0 with an overall arrival rate of λ, where the ith arrival, i.e. Potential { }

14 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Figure 2.1: Network Model

Figure 2.2: Arrival Processes

Request i, requests object r Z+1 at time t R+, where 0 < t < t < . . .. The Actual Request i ∈ i ∈ 1 2 Process A (t); t > 0 is a thinned version of P (t) where each arrival on P (t) is an arrival on A (t) { } with probability p, independent of all other arrivals. Let R ; i = 1, 2,... be IID Bernoulli (p) { i } indicator random variables where Ri = 1 if the ith arrival on P (t) is an arrival on A (t). Thus, A (t) is a Poisson process with an average arrival rate λp. For convenience, we denote an actual request with its index in P (t) instead of A (t).

An important assumption we make is that every potential request requests a different object, i.e. r = r , i = j, i.e., the catalog size is assumed to be infinite. This assumption is motivated i 6 j ∀ 6 by many practical problems, e.g. 1) prefetching problems, where each prefetched object is usually considered to be specific for one user request, 2) applications where data objects are highly dynamic, like live streaming, online gaming, sensing data, cloud computing, etc., and 3) the small likelihood that a user would request for the objects that are recently requested. For more general applications, our aim is to (for simplicity) exclude the impact of popularity distributions and focus on the potential gains of proactive caching in the presence of uncertain predictive information.

Predictions: At time 0, the server knows the sequence of objects (r1, r2, r3,....) to be

1Z+ = {1, 2, 3,...} in this chapter.

15 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS requested by the arrivals in P (t) , and the probability p. It has no prior knowledge of the precise { } arrival epochs t , or the realizations of the indicator random variables R . The server observes { i} { i} A (t) but not P (t). At time t > 0, the sequence of indices for future potential requests from the server’s viewpoint, or the prediction window2, is:

Π (t) = (I (t) + 1,I (t) + 2,I (t) + 3,...) (2.1) where I (t) is defined as:

I (t) max i t < t, R = 1 (2.2) , { | i i } i.e. the index of the most recent actual request before time t. The server proactively works on request i only if i Π (t) at time t. ∈ The idea of this prediction model originates from perfect prediction models used in the work of [48], [50]. With our prediction model, we are able to tractably model uncertainties in whether potential requests are realized, as well as uncertainties in the request arrival epochs.

2.2.2 Service Model

In this section, we first describe the reactive scheme, where the server works only on requests made by actual request arrivals. We then introduce the proactive schemes where the server works on future potential requests when not serving requests made by actual requests.

Reactive Scheme: The server node serves only arrivals in the actual request process A (t) based on strategy ΨR as described below. Upon observing an actual request i at time ti, the object ri is placed into the tail of a FIFO Queue with V (ti) of unfinished work, which is transmitted back to

2We assume the prediction window size to be infinite for simplicity of analysis. This assumption guarantees that the server always has predicted requests on which to do proactive work.

16 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS the user at rate µ, where V (t) is defined as the total number of bits waiting to be transmitted in the queue at time t in the reactive scheme. If V (t) = 0, the system is idle at t.

Proactive Schemes: The server can proactively send a data object, partially or in entirety, to the user, which can store the data object in a local cache. Since our focus is on the effects of uncertain predictions, we assume for simplicity that the cache size is infinite.

Let U (t) s be the proactive work done for request i by time t, i.e. the number of bits of i ≤ object ri sent to the user and stored in the cache by time t. Notice that for a request i, there is no reason to continue to proactively serve it after t , where H (i) min j i R = 1 represents H(i) , { ≥ | j } the first potential request after i which is realized. Let

  Ui , min s, Ui tH(i) be the total proactive work done for request i. For an actual request i (R = 1), U = min s, U (t ) . i i { i i } Define S = s U = max 0, s U t  as the reactive part of object r which remains i − i − i H(i) i to be transmitted after the server stops proactively serving request i. For an actual request i,

S = s U = max 0, s U (t ) bits need to be transmitted reactively at t . Let U (t) i − i { − i i } i , U (t) ,U (t) ,U (t) ,... be the set of U (t)’s where i Π (t). At time t, based I(t)+1 I(t)+2 I(t)+3 i ∈ on U (t), the prediction window Π (t) and the queue size V (t), a stationary proactive rate allocation strategy ΨP at the server is defined as:

 Ψ(V (t) , Π (t) , U (t)) = ρV (t) , ρI(t)+1 (t) , ρI(t)+2 (t) , ρI(t)+3 (t) ,... where ρ (t) is the rate allocated to serve the queue of V (t), and ρ (t) , i 1, is the rate V I(t)+i ≥ allocated to fetch object rI(t)+i at time t. We assume that the data in V (t) has higher priority than P proactive traffic. That is, if V (t) > 0, then i∈Z+ ρI(t)+i (t) = 0. Thus, we consider the set ΓP of proactive strategies ΨP satisfying:

17 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

(Reactive State) If V (t) > 0: • ∞ X ρV (t) = µ, ρI(t)+i = 0 (2.3) i=1

(Proactive State) If V (t) = 0: • ∞ X ρV (t) = 0, ρI(t)+i = µ (2.4) i=1

The limiting average amount of proactive work received per potential request • PI(t) i=1 Ui U , lim (2.5) t→∞ I (t)

exists for ΨP ;

The limiting average amount of proactive work received per actual request • P + i∈Z :i≤I(t),Ri=1 Ui UA , lim (2.6) t→∞ A (t)

exists for ΨP .

An example of a strategy in ΓP is the Earliest-Deadline-First (EDF) strategy. In the EDF strategy, if

V (t) = 0 at time t, then ρ = µ, where J (t) = min i Π(t) U (t) < s . We use EDF strategy J(t) { ∈ | i } as an important baseline policy throughout the chapter for the purpose of analysis and comparisons.

Given a sample path of arrival epochs and R realizations, the evolutions of unfinished work in { i}

V (t) under the EDF strategy and under the reactive scheme ΨR are compared, as shown in Figure

2.3.

In the figure, the system runs from time 0 to t. Potential requests 1 to 8 arrive at t1 to t8 respectively during this period of time, with all potential requests realized except for request 6. The evolution of unfinished work V (t) under the reactive scheme is plotted in blue and the evolution in

18 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Figure 2.3: Comparison between the Reactive Scheme and the Proactive Scheme

the proactive scheme with the EDF strategy is plotted in red, with the corresponding states marked on the time axis. We also show the rate allocation in the proactive scheme.

2.2.3 Problem Formulation

As shown in Figure 2.3, there is less traffic served reactively in the proactive EDF scheme as compared with the reactive scheme. Reducing reactive traffic is doubly desirable since (1) the delay is reduced, and (2) there is more time for the server to do proactive work. Motivated by this, we study an optimization problem where the objective is to maximize the average amount of proactive work done for each request. Given λ, µ, p and s, our optimization problem then can be formulated as:

maximize U (ΨP ) (2.7) ΨP

subject to Ψ Γ P ∈ P

19 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

∗ where ΓP is defined in (2.3)-(2.6). Let Ψ be an optimal solution to problem (2.7) and let U max ,

U (Ψ∗) denote the U achieved by Ψ∗. The solution to (2.7) is discussed and presented in Sections

2.3 and 2.4.

Operating Regimes: In fact, there is a limited region of λ we are interested in. In the region 0 λ < µ , U = s, w.p.1 by Corollary 2 in [50] and Theorem 2 in [51]. With knowledge ≤ s max of Π (t), the server is able to proactively serve every request before its arrival epoch with probability

1, even if every request is realized. In the region λ µ , the arrival rate of the actual request process ≥ ps is beyond the stability region of the network. According to [56], full knowledge of the future does not enlarge the stability region of the system. Thus, the queue V (t) cannot be stabilized in this region.

This implies that the server almost always works reactively, sparing no bandwidth for proactive service. In the region µ λ < µ , an optimal solution Ψ∗ to problem (2.7) is proposed and analyzed s ≤ ps in Section 2.4. Thus, we have the following fact:   s, w.p.1 if 0 λ < µ  ≤ s   U max = U (Ψ∗) , if µ λ < µ (2.8)  s ≤ ps    0, w.p.1 if µ λ  ps ≤ Delay Performance: The corresponding delay of Ψ∗ is analyzed in Section 2.5. For a given Ψ Γ , we define the delay of an actual request i as P ∈ P   V (t )  i + Si 0 = 1  µ µ , if Si > and Ri Di =   0, otherwise

V (ti) Si where µ is the waiting time of object ri in the queue at the server, and µ is the transmission time

20 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

of the reactive part of object ri. Define the limiting average delay per actual request as:

P + i∈Z :ti

∗ Denote the average delay per actual request under Ψ by DΨ∗ . We will derive the closed-form expression of DΨ∗ , and analytically demonstrate its advantage relative to average delay of the EDF proactive strategy.

2.3 Relation between Reactive Scheme and Proactive Schemes

Proactive caching makes use of available link capacity when the system is idle (under the reactive scheme). A natural question to ask is how much proactive work can be done for each request on average. We can gain intuition from the example in Figure 2.3. First, the idle period in the reactive scheme can be utilized for proactive service. Then, by proactively serving actual requests

(i.e., 1,2,3,4,5,7,8), reactive traffic is reduced so that available link capacity can be utilized more frequently for proactive service. This is indicated in Figure 2.3 by the intervals marked by solid red, named ”Proactive Served”. In the following, we study the characteristics of proactive service and derive an upper bound on U.

Consider a set of sample paths corresponding to arrival epochs t : i = 1, 2,... and { i } realizations R = z : i = 1, 2,... (z 0, 1 ) under both the reactive scheme and a proactive { i i } i ∈ { } scheme Ψ . We make the following definitions. The amount of time that Ψ Γ works in the P P ∈ P proactive state (namely Proactive Proactive) from 0 to t is:

T (t) τ (0, t]: V (τ) = 0 (2.9) PP , |{ ∈ }|

21 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Table of Notations

V (t) Unfinished reactive work at server node

s Object size

µ Constant service rate of the system

P (t) Potential request process

A (t) Actual request process

λ Average arrival rate of P (t)

p Probability that each potential request is realized

ti Arrival epoch of potential request i

Ri Indicator random variable for whether request i is realized

Ui Total amount of proactive service for request i

Ui (t) Amount of proactive service for request i by time t

Si Amount of reactive work for request i

Π (t) Prediction window

I (t) Index of the latest actual request before time t

J (t) Index of the request to practively serve at t

U Limiting time average proactive service per potential request

UA Limiting time average proactive service per actual request

U ∗ Maximum limiting average proactive service per potential request

φ ΨP Threshold-based strategy with threshold φ

Xn Markov chain

τn Epoch of the nth transition

Table 2.1: Table of Notation

22 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

The amount of time that Ψ Γ works in reactive state (namely Proactive Reactive) from 0 to t is: P ∈ P

T (t) τ (0, t]: V (τ) > 0 (2.10) PR , |{ ∈ }|

The limiting fraction of time that Ψ Γ works in the reactive state and in the proactive state, P ∈ P respectively, are:

TPR (t) TPP (t) αPR , lim , αPP , lim (2.11) t→∞ t t→∞ t

Before we continue to study the relation between the reactive scheme and the proactive scheme, we first define two important properties of proactive strategies.

Definition 1 (Property 1 of Proactive Strategies). A proactive strategy Ψ Γ satisfies Property 1 P ∈ P if the following condition is satisfied:

P∞ i=I(t)+1 Ui (t) lim = 0, w.p.1 (2.12) t→∞ t

P∞ The term i=I(t)+1 Ui (t) represents the total amount of proactive work done for potential requests in the prediction window Π (t) up to t. Although this part of proactive work may be requested eventually in the future, it does not contribute to the reduction of reactive work by time t.

P∞ If i=I(t)+1 Ui (t) scales with t, it is likely that the corresponding U can be further improved by a strategy which invests more proactive service into requests in the near future. We will later formally analyze the influence of this property on our objective in Theorem 1.

Proposition 1. For all Ψ Γ , we have P ∈ P

U U , w.p.1 ≥ A

Proof. Please refer to Appendix A.1 for the proof.

23 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

We then have the following definition of the second property:

Definition 2 (Property 2 of Proactive Strategies). A proactive strategy satisfies Property 2 if the following condition is satisfied:

UA = U, w.p.1 (2.13)

Proposition 1 implies that in our setting, the average amount of proactive work per actual request is no more than the average amount of proactive work per potential request. On the other hand, it is more desirable that more proactive services are done for actual requests. With Property 1 and 2, we have the following theorem of proactive strategies.

Theorem 1. Given µ, λ, s and p as system parameters, the limiting fractions of time that the server works in the proactive state and the reactive state, respectively, under Ψ Γ satisfy P ∈ P µ λps (λs µ) p α − , w.p.1, α − , w.p.1 PP ≤ µ (1 p) PR ≥ µ (1 p) − − Equality holds in both inequalities if and only if the proactive strategy satisfies both Property 1 and

Property 2.

Proof. Please refer to Appendix A.2 for the proof.

Theorem 1 implies that in order to maximize the fraction of time that the system works proactively, or equivalently minimize the fraction of time that the system works reactively, the proactive strategy ΨP must satisfy both Property 1 and Property 2. On the other hand, recall that we are interested in the operating regime µ λ < µ . If λ = µ , we have α = 0, w.p.1 if and only if s ≤ ps s PR Ψ satisfies both Property 1 and Property 2. If λ = µ , we have α 1, w.p.1, which implies that P ps PR ≥ the system almost always works reactively with any proactive strategy Ψ Γ . These results are P ∈ P consistent with previous discussions before (2.8).

24 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Define

P + i∈Z :Ri=1,i≤I(t) Si S , lim (2.14) t→∞ A (t) as the limiting average amount of reactive work of each actual request. Then based on Theorem 1, we have the following corollary on U and S.

Corollary 1. Given µ, λ, s and p satisfying µ λ < µ , the limiting average amount of proactive s ≤ ps work per potential request under strategy Ψ Γ satisfies P ∈ P

µ pλs U − U ∗, w.p.1 (2.15) ≤ λ (1 p) , −

The limiting average amount of reactive work per actual request under strategy Ψ Γ satisfies P ∈ P

λs µ S − S∗, w.p.1 (2.16) ≥ λ (1 p) , − where equality holds in both inequalities if and only if strategy ΨP satisfies both Property 1 and

Property 2.

Proof. Please refer to Appendix A.3 for the proof.

Corollary 1 shows that the limiting average amount of proactive work done per potential request is maximized if and only if a proactive strategy ΨP satisfies both Property 1 and Property

2. By Property 2, UA is maximized under the same condition. Therefore, the optimal solution to the objective in (2.7) should be proactive strategies which satisfy both Property 1 and Property 2.

We will construct such a proactive strategy, and also explain why the EDF strategy is not an optimal solution in the next section.

25 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

2.4 Threshold-based Proactive Strategy and Markov Chain

In order to construct an optimal proactive strategy to solve (2.7), we first define a family of threshold-based strategies in ΓP . We then analyze the asymptotic behaviors of the threshold-based proactive strategies by constructing and analyzing a corresponding Markov chain. Using this analysis, we relate the threshold-based strategies to Property 1 and Property 2, and construct an optimal solution to the problem (2.7) by choosing a specific threshold for the threshold-based strategies.

2.4.1 Threshold-Based Proactive Strategies

φ We describe the threshold-based strategies ΨP in Algorithm 1. Specifically, we define

φ (0, s] as the threshold parameter. When working proactively, the threshold-based strategy Ψφ ∈ P works on request J (t) at time t, where J (t) = min i Π(t) U (t) < φ is the first request in the { ∈ | i } φ prediction window Π (t) which has not received φ bits of proactive service. By the definition of ΨP , the process J (t); t > 0 is non-decreasing. In order to study the impact of φ on the threshold-based { } proactive strategies, we construct and analyze a corresponding Markov chain under given φ.

φ 2.4.2 Markov Chain of System under ΨP

φ We construct a Markov chain corresponding to the system under ΨP , using methods applied in the analysis of M/G/1 queues and G/M/1 queues [57],

φ φ Definition 3 (Markov Chain of the Proactive System under ΨP ). Let T , (τ0, τ1, τ2, . . . , τn,...)

+ be the sequence of transition epochs, where each τn, n = 0, 1,... satisfies 1) V (τn ) = 0; 2)

+ + + U + (τ ) = 0, and 3) J (τ ) > P (τ ). The discrete-time process Xn : n = 0, 1,... with J(τn ) n n n { }

26 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

φ Algorithm 1 Threshold-based Strategies ΨP 1: Main Procedure SYSTEM RUN(φ)

2: Choose the threshold as φ;

3: Initialize V (t), Π (t)

4: while t > 0 do

5: if Request i arrives at t then

6: Put reactive part Si of request i into the tail of the queue V (t).

7: Update prediction window Π (t)

8: end if

9: % Reactive work

10: if V (t) > 0 then

11: Transmit data from the head of the queue V (t) with full rate µ.

12: end if

13: % Proactive work

14: if V (t) = 0 then

15: Set J (t) = min{i ∈ Π(t)|Ui(t) < φ}

16: % J (t) is the earliest potential request in Π(t) which has received less than φ bits of proactive service

17: if UJ(t) (t) < φ then

18: Transmit data of rJ(t) at full rate µ

19: end if

20: end if

21: end while

22: End Procedure

27 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

φ s Figure 2.4: Example: Transitions in the proactive system with ΨP , with φ = 2

state space 1, 2, 3,... is defined as: { }

X = J τ + P τ + , n = 0, 1, 2,... (2.17) n n − n

In Proposition 2, we will show that X is a Markov chain. We interpret the three { n} conditions in Definition 3 as follows. Condition 1) means that there is no reactive traffic to serve right

+ after τn, so the server can proactively serve requests in Π (τn ). Condition 2) means that at τn, the

+ server starts to proactively work on request J (τn ), which has not received proactive service before

+ + τn . The last condition means that the potential request to be proactively served at τn should be a potential request which has not arrived in P (t) by τ +. To summarize, the discrete-time process { } n X : n = 0, 1,... is constructed by sampling the system at τ : n = 0, 1,... when the server { n } { n } starts to proactively work on a future potential request.

At each epoch τ , n Z+, the nth transition in the Markov chain occurs. X = J (τ +) n ∈ n n − P (τ +) , n = 0, 1, 2,..., represents how far the proactive service process J (t); t 0 is ahead of n { ≥ } the potential arrival process P (t); t 0 at epoch τ +. Figure 2.4 shows an example of how the { ≥ } n transition epochs τ : n = 0, 1,... are chosen. { n } s Example: In the example shown in Figure 2.4, we choose φ = 2 in the threshold-based

28 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS strategy. We make the following observations on the evolution of the process.

1. No arrival occurs in (τ0, τ1). The server finishes proactively serving request 1 at τ1, and starts

to proactively serve request 2. The process in (τ1, τ2) evolves in the same way.

2. In (τ2, τ3), the server proactively serves request 3; requests 1 and 2 arrive, with 1 realized and

2 not realized. At τ3, the server starts to proactively serve request 4.

3. In (τ3, τ4), request 3, 4, 5 arrive with only request 3 realized, before the server can finish

proactively serving φ bits of request 4. Because the server cannot observe the arrival of request

4 or 5, it keeps proactively serving request 4 until τ 0. At τ 0, the server starts to proactively serve

request 5 Π(τ 0) = (4, 5,...). Nevertheless, condition 3) in Definition 3 is not satisfied at ∈ τ 0 (J (τ 0+) P (τ 0+) = 0 < 1). Thus, τ 0 is not a transition epoch. At τ , the server starts to − 4

proactively serve request 6. Since conditions (1)-(3) in Definition 3 are all satisfied, τ4 is a

transition epoch.

4. In (τ4, τ5), request 6 arrives and is realized before it receives φ bits proactively. Thus, it is

served reactively until all bits are received. Since there is no arrival before it finishes, we

have I (τ5) = P (τ5) = 6, so that the server starts proactively serving request 7, and τ5 is a

transition epoch.

We define A P τ +  P (τ +) as the number of potential arrivals in (τ , τ ], and n , n+1 − n n n+1 T τ τ as the nth inter-transition time. Starting from X = x , x Z+, the first x 1 n , n+1 − n n n n ∈ n − requests in Π(τ +), i.e., P (τ +) + 1,...,P (τ +) + x 1 (no requests if x = 1), have already n n n n − n + received φ bits of proactive service by τn, and the request P (τn ) + xn just starts to be proactively served from τ +. If A x , we have X = 1. If A < x , X depends on A . In the n n ≥ n n+1 n n n+1 n

29 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS following proposition, we formally describe the evolution of X ; n 0 and show its Markovian { n ≥ } property.

Proposition 2. The discrete-time process X ; n 0 defined in Definition 3 for the proactive { n ≥ } φ system under ΨP is Markovian, with the evolution

X = max X + 1 A , 1 , n = 0, 1,... (2.18) n+1 { n − n }

Proof. Please refer to Appendix A.4 for the proof.

We now consider the transition probabilities P r X = x X = x : x Z+, { { n+1 n+1| n n} n ∈ x Z+ . n+1 ∈ }

1. If xn+1 > xn + 1, such transitions cannot happen by Definition 3.

2. If 1 < x x + 1, we have the following fact: n+1 ≤ n

P r X = x X = x = P r A = x + 1 x X = x { n+1 n+1| n n} { n n − n+1| n n}

which follows Proposition 2. If we let A = k, then x > k 0. n n ≥

An interesting fact is that:

P r A = k X = k + 1 = P r A = k X = k + 2 = ... (2.19) { n | n } { n | n }

This is because τ is determined by 1) arrival epochs t : i > P (τ +), i Z+ , 2) realiza- n+1 { i n ∈ } tions R : i > P (τ +), i Z+ , and 3) reactive work to be done for each request S : i > P (τ +) { i n ∈ } { i n , i Z+ . If A < x , we have: ∈ } n n

If x > 1 :U (τ +) = φ, i = P (τ +) + 1,P (τ +) + 2,...,P (τ +) + x 1 n i n n n n n −

+ + + Ui(τn ) = 0, i = P (τn ) + xn,P (τn ) + xn + 1,...

+ + + If xn = 1 :Ui(τn ) = 0, i = P (τn ) + 1,P (τn ) + 2,...

30 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

by the definition of threshold-based strategies. Then for these An = k < xn arrivals, we have:

S = s φ, i = P (τ +) + 1,P (τ +) + 2,...,P (τ +) + k, if k = 1, 2,... i − n n n

And there is no reactive work to be done if there is no arrival before next transition, i.e.,

A = 0. So P r A = k X = x only depends on conditions 1) and 2) if k < x , which n { n | n n} n implies (2.19).

Here we define:

pφ P r A = k X = x , x > k (2.20) k , { n | n n} ∀ n

and we have

P r X = x X = x = pφ , if 1 < x x + 1 (2.21) { n+1 n+1| n n} xn+1−xn+1 n+1 ≤ n

3. If xn+1 = 1, we have:

∞ X P r X = x X = x = 1 P r X = i X = x { n+1 n+1| n n} − { n+1 | n n} i=2 x +1 Xn = 1 pφ − xn+1−i i=2 x −1 Xn = 1 pφ − k k=0 ∞ X φ = pk (2.22) k=xn

31 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Then the transition probabilities can be written as:

pφ P r X = x X = x xnxn+1 , { n+1 n+1| n n}   0, x x + 2  n+1 ≥ n   = pφ , 1 < x x + 1  xn+1−xn+1 n+1 ≤ n    ∞ φ P p , x = 1  k=xn k n+1

, x Z+, x Z+ (2.23) ∀ n ∈ ∀ n+1 ∈

Or equivalently, we can write the transition probabilities in matrix form:   P∞ pφ pφ  k=1 k 0      P∞ pφ pφ pφ   k=2 k 1 0  P φ =   (2.24)   P∞ φ φ φ φ   k=3 pk p2 p1 p0      ...... where the empty entries are 0. Notice that it is structurally similar to the transition probability matrix of the Markov chain of G/M/1 queue in [57].

Although we have developed the structure of the transition probability matrix, the probabil-

n φ o ities pk : k = 0, 1, 2,... are still unknown. In the following theorem, we are going to prove an important result of the probabilities.

n φ o Proposition 3. The probabilities pk : k = 0, 1, 2,... satisfy the following relationships:    ∗ > 0, if φ < U  ∞  X φ  p (1 k) ∗ (2.25) k − = 0, if φ = U k=0     ∗ < 0, if φ > U

32 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

∗ µ−pλs where U = λ(1−p) , as defined in (2.15).

Proof. Please refer to Appendix A.5 for the proof.

Although we obtain some knowledge about transition probabilities of the Markov chain from Proposition 3, a remaining problem of the Markov Chain is the distribution of Tn and An. If

T = with a positive probability, the next transition may never happen. Therefore, we have the n ∞ following proposition on the expectations of Tn and An.

φ Proposition 4. In the Markov Chain of the proactive system with ΨP as defined in Definition 3, we have:

E [T X = x ] < ,E [A X = x ] < , x Z+, φ (0, s] (2.26) n| n n ∞ n| n n ∞ ∀ n ∈ ∀ ∈

Proof. Please refer to Appendix A.6 for the proof.

Proposition 4 implies that P r T < = 1, P r A < = 1, n Z+. Therefore { n ∞} { n ∞} ∀ ∈ transitions in the corresponding Markov chain will almost surely happen in finite time.

To investigate the asymptotic behavior of the system, we need to characterize the recurrence of the Markov chain of the system. Based on Proposition 3 and Proposition 4, we have the following

φ theorem on the recurrence of the Markov chain of the proactive system under ΨP .

φ ∗ Theorem 2. The Markov chain of the proactive system with ΨP is 1) transient if φ < U , 2) positive recurrent if φ > U ∗, and 3) null recurrent if φ = U ∗.

33 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Proof. From Proposition 3, we can easily prove that:    ∗ < 1, if φ < U  ∞  X φ  ∗ pk k = 1, if φ = U (2.27) k=0     ∗ > 1, if φ > U

P∞ φ In Section 10.3.3 of [58], the relation between k=0 pk k and the recurrence of the corresponding Markov chain is discussed. To be specific, the conclusion is that the Markov chain

P∞ φ P∞ φ is 1) positive recurrent if k=0 pk k > 1, 2) null recurrent if k=0 pk k = 1, and 3) transient if

P∞ φ k=0 pk k < 1. Our conclusion directly follows.

Theorem 2 characterizes the relationship between φ and the recurrence of the Markov

φ chain under ΨP . The recurrence of the corresponding Markov chain under different φ is the crucial key to investigating the relationship between Property 1 and Property 2 with the threshold-based strategies. In the following, we are going to discuss this relationship.

Property 1 of the Threshold-based Strategies: First, we focus on Property 1 and the threshold-based strategies in the following lemma.

Lemma 1. A threshold-based strategy satisfies Property 1 if and only if the corresponding Markov chain satisfies:

X lim n = 0, w.p.1 n→∞ n

Proof. Please refer to Appendix A.7 for the proof.

Lemma 1 transforms the conditions for Property 1 from the continuous sense in Definition

Xn 1 to a discrete condition based on transitions in the Markov chain. The term limn→∞ n is closely

34 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS related to the recurrence of the Markov chain, which has been characterized in Theorem 2. Then we have the following theorem on Property 1 of the threshold-based strategies.

Theorem 3. A threshold-based strategy satisfies Property 1 if and only if φ U ∗. ≥

Proof. Please refer to Appendix A.8 for the proof.

Another way of stating Theorem 3 is that a threshold-based strategy satisfies Property 1, if and only if the corresponding Markov chain is recurrent. Recall that the states Xn’s represent the gaps between the proactive service process J (t); t > 0 and the potential process P (t); t > 0 . { } { }

If the corresponding Markov chain is recurrent, the state Xn = 1 will always happen. This implies that the proactive service done effectively reduces the reactive traffic of the requests which have arrived, which is also the insights of Property 1.

Property 2 of the Threshold-based Strategies: Next, we are going to discuss Property

2 of the threshold-based strategies. As we discussed in Proposition 1, U U is true due to our ≥ A service model. Predictions are likely to receive more proactive service if they are unrealized. Because of our assumptions on the orderliness of predictions in Π (t), the predictions which have arrived but not realized are always the earliest predictions in Π (t). Intuitively, a threshold-based strategy with a larger φ, which prefers to serve the earliest predictions in Π (t), is more likely to achieve U > UA.

We rigorously characterize the relationship of the threshold-based strategies and Property 2 in the following theorem.

Theorem 4. The threshold-based strategy Ψφ satisfies Property 2 if and only if φ U ∗. P ≤

Proof. Please refer to Appendix A.9 for the proof.

35 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Theorem 4 verified our previous intuitions. Similar to Theorem 3, Theorem 4 has an equiv-

φ alent statement: the threshold-based strategy ΨP satisfies Property 2, if and only if the corresponding

Markov chain is NOT positive recurrent. As we discussed, U > UA is more likely to happen when the strategy proactively works on the requests which have arrived but not realized. This only happens when the system transits to state Xn = 1. In a transient or null recurrent case, the system state

Xn = 1 does not happen comparably often as n. As a result, Property 2 is satisfied in these cases.

Based on Property 1 and Property 2 of the threshold-based strategies as characterized in

Theorem 3 and 4, we have the following corollary which solves the optimization problem (2.7).

φ Corollary 2. U in (2.7) is maximized with a threshold-based proactive strategy ΨP if and only if

φ = U ∗.

Proof. By combining Theorem 3, Theorem 4 and Corollary 1, the corollary directly follows.

U ∗ Based on the corollary, ΨP is a solution to the optimization problem (2.7). Notice that this is the only threshold-based strategy which maximizes U, and it is the only case where the corresponding Markov chain is null recurrent.

We obtained the following valuable insights about the characteristics of an optimal proactive strategy under prediction uncertainties. First, the strategy should not overemphasize predictions which are near in the future, as how the EDF strategy works, in order to account for the fact that the potential requests may not be realized. Second, it should not overemphasize predictions which are far in the future, in order to provide sufficient proactive services for the requests which may arrive in the near future. Balancing these two effects as a function of the prediction uncertainties is the key to designing a desirable proactive strategy.

36 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

2.5 Delay Comparison between UNIFORM and EDF strategies

In this section, we focus on two special proactive strategies, which are the EDF (Earliest-

Deadline-First) type strategy and the UNIFORM strategy. The EDF strategy can be seen as the threshold-based strategy with φ = s, which means the server will always first proactively work on the first request in Π (t) which has not been completely proactively served. The EDF strategy has been widely used in many scheduling problems in queueing systems. Intuitively, reducing traffic at the beginning of a congested period might be the most efficient way to reduce delay. In our case where all objects have a uniform size, the EDF strategy works the same as the shortest remaining time

first (SRTF) strategy, which achieves the optimal delay in a reactive queueing system. In a proactive system, the authors of [50] have proved that the EDF strategy can achieve asymptotic optimality in terms of delay when the size of the prediction window goes to infinity with full knowledge of future requests and their arrival epochs. However, we will show that the UNIFORM strategy outperforms the EDF strategy in terms of delay in the case with uncertain predictions.

First, we derive an important property of the UNIFORM strategy in the following corollary.

Corollary 3. Given µ, λ, s and p as system parameters which satisfy µ λ < µ , the system s ≤ ps U ∗ operates under the UNIFORM strategy ΨP . Then the limiting empirical distribution of Ui satisfies

I(t) 1 X ∗ lim 1 (Ui = U ) = 1, w.p.1 (2.28) t→∞ I (t) i=1

Proof. Please refer to Appendix A.10 for the proof.

The Corollary 3 shows that the requests under the UNIFORM strategy receive U ∗ bits of proactive service with probability 1. Consequently, the reactive work of each actual request is S∗ with probability 1. Since almost all actual requests receive the same amount of proactive service, we

37 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS call this strategy UNIFORM. In the following, we derive the closed-form expression for the average delay per actual request under the UNIFORM strategy.

Corollary 4. Given µ, λ, s and p as system parameters which satisfy µ λ < µ , the average delay s ≤ ps U ∗ DU(NIFORM) per actual request under the UNIFORM strategy ΨP can be expressed as:

(λs µ) (2µ µp λps) D = − − − , w.p.1 (2.29) U 2µλ (1 p)(µ λps) − −

If we define DR(eactive) as the average delay of each actual request under the reactive scheme, the ratio of DU can be expressed as: DR

D (λs µ) (2µ µp λps) U = − − − , w.p.1 (2.30) D λs (1 p) (2µ sλp) R − −

Proof. Please refer to Appendix A.11 for the proof.

The ratio in (2.30) directly compares the delay of UNIFORM strategy against the reactive scheme, and we will plot it in Section 2.6. Next, we compare the average delay of the UNIFORM strategy against EDF strategy.

Corollary 5. Given µ, λ, s and p as system parameters which satisfy µ λ < µ , the average delay s ≤ ps of UNIFORM strategy DU(NIFORM) is no greater than the EDF strategy DE(DF ) with probability 1:

D D , w.p.1 (2.31) U ≤ E

The equality holds if and only if p = 0.

Notice that 0 p < µ < 1, so p = 0 is the only value where the equality holds. ≤ λs

Proof. Please refer to Appendix A.12 for the proof.

38 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

The proof of Corollary 5 reveals the insights on why the UNIFORM strategy outperforms the EDF strategy. First, the EDF strategy satisfies Property 1 but not Property 2. As a result, the average reactive work per actual request S is larger under the EDF strategy by Corollary 2, which means the server needs to deal with more reactive work on average. Second, the unbalanced allocation of proactive rates in the EDF strategy impacts the delay performance. As shown in Figure 2.3, the

EDF strategy works well when requests are realized, like the first 5 requests. However, when the first future potential request seen by the server is not realized, the EDF strategy usually achieves awful delay performance. Also take Figure 2.3 as an example. Request 6 receives a lot of proactive services but it is not realized, which causes request 7 to be served almost completely reactively. Consequently, request 8 suffers from large queueing delay.

2.6 Numerical Evaluation

We perform extensive experiments to study the delay performance of threshold-based strategies. Specifically, we compare the UNIFORM strategy with the EDF strategy, with the reactive scheme as a baseline. In our simulations, we consider the network in Figure 2.1. We set µ = 10 and s = 1 in all of our experiments.

In our simulations, we gradually increase the threshold φ from 0 to s and compare the average delay per actual request in each case. Specifically, when φ = s, the strategy becomes the

EDF strategy; when φ = U ∗, the strategy becomes the UNIFORM strategy; and when φ = 0, the system operates in the reactive scheme. The term λp determines how heavily the network is loaded, and we choose λp = 6 as the lightly-loaded network scenario and λp = 9.6 as the heavily-loaded network scenario. With each fixed value of λp, we gradually increase λ from 10 to 20 and choose p

39 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS correspondingly, to evaluate the effects of prediction uncertainties on the delay performance. We set the simulation time to be 107 seconds.

2.6.1 Infinite Prediction Window Scenarios

We first demonstrate the delay performance of threshold-based strategies under an infinite prediction window.

Figures 2.5 and 2.6 show the delay performance of threshold-based strategies, with different thresholds and different prediction uncertainties. The x-axis represents the threshold φ, which gradually increases from 0 to s. Each curve corresponds to a (λ, p) combination with the same product λp. Each vertical dotted line represents the thresholds U ∗ of the UNIFORM strategy under each (λ, p) combination, which shares same color with the corresponding curve. For each curve, the delay of the EDF strategy is shown at x = s = 1, and the delay of the reactive scheme is shown at x = 0.

Here are some interesting observations on the plots:

The vertical lines perfectly mark the minimum point on each curve.3 This implies that the • UNIFORM strategy always achieves the best delay performance among all the threshold-based

strategies.

If we compare two curves corresponding to different (λ, p) combinations (but with the same • product λp), we can see that the delay performance of the curve with larger p and smaller λ

outperforms the one with smaller p and larger λ, until they overlap. This is because larger p and

smaller λ imply higher predictability, so that the proactive strategy has the potential to achieve a

3Note that the vertical lines indicate φ = 1 on the curves for λ = 10 in both figures, meaning that φ = 1 is the optimal threshold, i.e. the UNIFORM strategy is the same as the EDF strategy in this case.

40 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Figure 2.5: Comparisons among threshold- Figure 2.6: Comparisons among threshold- based methods: λp = 6 based methods: λp = 9.6

more desirable delay performance. The overlapping part is due to the choice of an overly-small

threshold φ. In this case, almost every request receives φ bits of proactive service, even in the case

with higher predictability. This points to the significance of Property 1.

If we compare Figure 2.5 and Figure 2.6, we observe that the curves between φ = s = 1 and • φ = U ∗ are flatter in the lightly-loaded scenario (λp = 6). This implies that delay performance is

less sensitive to threshold φ when the network is less congested. In the heavily-loaded case, the

choice of threshold φ is more crucial for achieving desirable delay performance.

In order to make more straightforward comparisons among the EDF strategy, the UNI-

FORM strategy and the reactive scheme, we plot the average delay achieved by these strategies in Figures 2.7 and 2.8. We observe that with the delay performance of the reactive scheme as the baseline, the delay performance of the EDF strategy becomes much worse in the heavily-loaded scenario as compared with that in the light-loaded scenario, whereas the delay performance of the

UNIFORM strategy is relatively stable in both scenarios.

In Figures 2.9 and 2.10, we compare queue size evolutions under the EDF strategy, the

41 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Figure 2.7: Comparisons among Figure 2.8: Comparisons among

EDF,UNIFORM and Reactive Schemes: EDF,UNIFORM and Reactive Schemes:

λp = 6 λp = 9.6

UNIFORM strategy and reactive scheme. In Figure 2.9, we observe that the queue size under the EDF strategy is very similar to that under the reactive scheme, implying that in this case, many requests do not receive proactive service under the EDF strategy. On the other hand, the UNIFORM strategy is able to keep the queue size at a low level. This is because the EDF strategy assigns proactive service in a very unbalanced manner, while the UNIFORM strategy assigns proactive resources almost uniformly among all requests. In Figure 2.10, the differences are magnified. When the network is heavily loaded, the EDF strategy fails to effectively control congestion, but the UNIFORM strategy is able to steadily keep the queue size at a very low level. This difference directly leads to the gap between the delay performance of the EDF strategy and the UNIFORM strategy in the heavily-loaded scenario.

In Figure 2.11, we plot the ratio of the average delay under the UNIFORM strategy to that of the reactive scheme, as calculated in Corollary 4. In this plot, λ is chosen to be from 10 to 50, and p is chosen from 0 to 10/λ (so that λps < µ). For a fixed λ, the system is more congested with a larger p. As can be observed, the UNIFORM strategy achieves a consistent advantage over the

42 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Figure 2.9: Queue Size Evolution Comparisons among Reactive Scheme, EDF strategy and

UNIFORM strategy: λp = 6

Figure 2.11: Theoretical Delay Comparison

between UNIFORM Strategy and Reactive

Figure 2.10: Queue Size Evolution Compar- Scheme isons among Reactive Scheme, EDF strategy and UNIFORM strategy: λp = 9.6 reactive scheme with a fixed λ. Even in a very congested case with bad predictions (λ = 50 and p approaches 0.2), the UNIFORM strategy still can achieve approximately a 20% advantage over the reactive scheme.

2.6.2 Finite Prediction Window Scenarios

In practice, prediction algorithms can only predict user requests in a finite future. In this section, we experimentally study the impact of prediction window size on delay performance. A

finite prediction window Π (t) = (I (t) + 1,I (t) + 2,...,I (t) + W ) is considered, where only

W predictions are available for any t > 0. In this case, there is a possibility that all the potential

43 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Figure 2.12: Comparisons among Figure 2.13: Comparisons among

EDF,UNIFORM and Modified-UNIFORM: EDF,UNIFORM and Modified-UNIFORM:

λp = 6 λp = 9.6 requests in Π (t) have been proactively served with φ bits. When this happens, the system must remain idle until there are new predictions available.

We carried out a series of experiments to assess the impact of the prediction window size W on the delay performance of EDF and UNIFORM strategies. We also consider a Modified-

UNIFORM (M-UNIFORM) strategy, as described in Algorithm 2. After every available prediction in Π (t) receives φ bits of proactive service, the M-UNIFORM strategy starts to proactively serve the earliest request which has not received s bits of proactive service in the prediction window.

Figures 2.12 and 2.13 show the delay performance of these strategies. The delay perfor- mance of the EDF strategy converges faster with respect to W . Thus the EDF strategy does not require a large prediction window to achieve its best delay performance. On the other hand, the

UNIFORM strategy converges much more slowly, especially in the heavily-loaded case. It also requires a moderately large prediction window size for the UNIFORM strategy to outperform the

EDF strategy, especially in the heavily-loaded case. However, we can greatly improve the delay

44 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS

Algorithm 2 Modified UNIFORM Strategy

1: Main Procedure SYSTEM RUN(U ∗)

2: Choose the threshold as U ∗;

3: Initialize V (t) ,I (t) , Π (t)

4: while t > 0 do

5: if Request i arrives at t then

6: Put reactive part Si of request i into the tail of the queue V (t).

7: Update prediction window Π (t)

8: end if

9: if V (t) > 0 then

10: % Reactive work

11: Transmit data from the head of the queue V (t) with full rate µ.

12: end if

13: if V (t) = 0 then

14: % Proactive work

∗ 15: Set i = min{i|I(t) < i ≤ I(t) + W, Ui(t) < U }

16: % i is the earliest potential request in Π (t) which has received less than U ∗ bits of proactive service

17: if i == null then

18: %All potential requests in Π (t) have received U ∗ bits of proactive work

19: Set j = min{i|I(t) < i ≤ I(t) + W, Uj (t) < s}

20: if Uj (t) < s then

21: Transmit data of rj with full rate µ

22: end if

23: if j == null then

24: %Every request in Π (t) has received s units of proactive work

25: Stay idle

26: end if

27: else

∗ 28: if Ui (t) < U then

29: Transmit data of ri with full rate µ

30: end if

31: end if

32: end if

33: end while 45 34: End Procedure CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS performance of the UNIFORM strategy with a few simple modifications. We can observe that the performance of the M-UNIFORM strategy in the small-window region is greatly improved over that of the UNIFORM strategy. For instance, in Figure 2.13, the UNIFORM strategy requires a window size W greater than 34 to outperform the EDF strategy for the case of λ = 12, p = 0.8. However, the M-UNIFORM strategy outperforms the EDF strategy even when W = 1.

2.7 Summary

In this chapter, we looked into the fundamental queueing dynamics of proactive caching strategies under uncertain predictions and developed insights on how to design a proactive strategy to achieve desirable delay performance in a single queue system. We solved an optimization problem of maximizing the limiting average amount of proactive service per request. By comparing queueing dynamics in the proactive scheme and reactive scheme under the same sample path, we derived a tight upper bound on the objective with uncertain predictive information of future requests. We proposed a family of threshold-based strategies, and constructed the Markov chain of the system to analyze the asymptotic behavior of the proactive system. Consequently, we found the optimal strategy, i.e. the UNIFORM strategy, by properly choosing the threshold in the threshold-based strategies, which corresponds to a null recurrent Markov chain. We obtained important insights about the characteristics of an optimal proactive strategy: the strategy should balance the amount of proactive work between the potential requests which are arriving sooner and the ones arriving later, based on the uncertainties in predictions. We derived the closed-form expression of average delay per actual request under the UNIFORM strategy, and analytically compared it with the commonly used EDF type strategy. We showed that the UNIFORM strategy outperforms the EDF strategy in

46 CHAPTER 2. PROACTIVE CACHING UNDER UNCERTAIN PREDICTIONS all the non-trivial scenarios, which is verified by extensive numerical experiments under differently congested network scenarios. Experimental results also showed that delay can be dramatically decreased by proactive caching techniques not only in the lightly-loaded region as claimed in [51], but also in the heavily-loaded case if properly designed. Our work provides valuable insights on how to optimally design a proactive strategy to improve the delay performance in the system.

47 Chapter 3

Delay-Optimal Proactive Strategy under

Uncertain Predictions

3.1 Introduction

Following Chapter 2, we study the fundamental queueing characteristics of a general proactive service system under uncertain predictions, and also design an optimal fixed-probability

(FIXP) proactive strategy which maximizes the fraction of time that the system works proactively and minimizes average delay with a probability-based proactive strategy in this chapter. Despite the similar logical structure with the previous chapter, we consider a more general service model in this chapter, which enables us to carry out a more comprehensive analysis on proactive systems and achieve more precise results, e.g., the limiting distribution of proactive systems, closed-form expressions for particular performance metrics, which we did not accomplish in Chapter 2. The main contributions and the structure of this chapter are listed as follows:

48 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

We propose a general potential request process, which characterizes both the randomness in • the arrival and service processes and the uncertainty in service requests. We aim to maximize

the limiting fraction of services completed proactively, meanwhile minimize the average delay

obtained under a family of FIXP proactive strategies. We introduce our system model and problem

formulation in Section 3.3.

Based on the system model, we characterize general proactive strategies by defining two properties • that significantly affect proactive strategies’ performance. Then we derive the bounds on the

limiting fraction of proactive work that a proactive system can support by comparing the same

sample path of arrivals and realizations in the proactive system and the corresponding reactive

system. We discuss the properties and perform the analysis in Section 3.4.

To have a gradual understanding of proactive systems, we start by investigating a Genie-Aided • system where the server can observe the arrival of all potential requests, no matter they are

eventually realized or not. We characterize the proactive system by a Markov process and the

corresponding embedded Markov Chain under FIXP strategies, then derive the optimal parameters

for FIXP strategies that optimize the limiting fraction of proactive service. Meanwhile, we derive

the closed-form expressions for the limiting average delay as a function of the parameter in FIXP

strategies and find the optimal FIXP strategy that minimizes delay. We discuss the Genie-Aided

system in Section 3.5.

Based on the results we have for the Genie-Aided system, we analyze the Realistic Proactive • system, where the server can only observe the arrivals of actual service requests. Like the previous

section, we derive the optimal parameter for FIXP strategies, which maximizes the limiting

fraction of proactive service and minimizes average delay. We also discuss its relationship with

49 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

the Genie-Aided system based on the results. We show the analysis on the Realistic Proactive

system in Section 3.6.

We conduct extensive numerical experiments to show the limiting fraction of proactive work • and average delay of FIXP strategies in both the Genie-Aided system and the Realistic Proactive

system with different parameters. We also compare our theoretical results with the numerical

experiments as a verification of our derivations. We show the numerical results and comparisons

in Section 3.7.

3.2 System Model

3.2.1 Network Model

We consider a single-server system, which operates in continuous time from time 0.

Request Processes: The Potential Request Process is a Poisson Process P (t); t > 0 { } with an overall arrival rate of λ, where the ith arrival, i.e. Potential Request i, requests service s Z+1 at time t R+, where 0 < t < t < . . .. The service time d of service s is i ∈ i ∈ 1 2 i i stochastic and follows IID Exp(µ). The arrivals in Actual Request Process A (t); t > 0 are { } selected from P (t), where each arrival on P (t) is an arrival on A (t) with probability p independently.

R ; i = 1, 2,... are IID Bernoulli (p) indicator random variables, where R = 1 if the ith arrival { i } i on P (t) is an arrival on A (t). As a result, A (t) is a Poisson process with an average arrival rate λp.

To avoid ambiguity, we denote an actual request with its index in P (t) instead of A (t).

An assumption is that every request is for a different service, i.e., s = s , i = j. This i 6 j ∀ 6 assumption is motivated by many practical problems, such as 1) computational tasks that are highly

1Z+ = {1, 2, 3,...} in this paper.

50 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS diverse, like gaming, real-time VR/AR, sensing data, cloud computing, 2) prefetching problems, where each prefetched object is usually considered to be specific for one user request, and 3) the small likelihood that a user would request for the services that are recently requested. The other reason is that we intend to exclude the impact of popularity distributions of services and purely concentrate on the fundamental improvement achieved from proactive service with uncertain predictive information.

Predictions: At time 0, the server knows the sequence of services (s1, s2, s3,....) cor- responding to the requests in P (t) and the probability p. The server has no knowledge of the { } exact arrival epochs t , the realization indicators R , or the service time d . We consider two { i} { i} { i} proactive systems, as described in the following.

1) Genie-Aided System: The server can observe the arrivals in P (t), i.e., the server knows

Ri at ti. Notice that the server cannot observe an unrealized request in practice, which is why we name this system Genie-Aided. In this scenario, we consider the following prediction window2:

ΠG (t) = (P (t) + 1,P (t) + 2,P (t) + 3,...) (3.1) which contains all the future requests.

2) Realistic Proactive System: The server can observe arrivals in A (t) but not P (t), which is realistic, comparing with the Genie-Aided system. At time t > 0, the sequence of indices for future potential requests from the server’s viewpoint, or the prediction window, is:

ΠR (t) = (I (t) + 1,I (t) + 2,I (t) + 3,...) (3.2) where I (t) is defined as:

I (t) max i t < t, R = 1 (3.3) , { | i i } 2We assume the prediction window size to be infinite for simplicity of analysis. This assumption ensures the server to always have proactive service to process.

51 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Figure 3.2: Arrival Processes Figure 3.1: Server Model which contains all the requests after the latest actual request by t. The fundamental difference between the Genie-Aided system and Realistic Proactive system is entirely based on the latest request they can observe, i.e., P (t) in the Genie-Aided system and I(t) in the Realistic Proactive system.

For convenience, we use Π(t) to denote the prediction window in the context where two proactive systems are not distinguished. Obviously, the server proactively works on request i only if i Π (t) ∈ at time t.

Our models have the advantage that we can tractably characterize uncertainties in arrival epochs, service time, and realizations of potential requests.

3.3 System Model

3.3.1 Server Model

We consider a single-server system, as shown in Figure 3.1. The server has a single-thread processor, providing reactive or proactive service, which operates in continuous time from time 0.

Request Processes: The Potential Request Process is a Poisson Process P (t); t 0 { ≥ } with an overall arrival rate of λ, where the ith arrival, i.e. Potential Request i, requests service

52 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS s Z+3 at time t R+, where 0 < t < t < . . .. The service time d R+ of service s i ∈ i ∈ 1 2 i ∈ i is stochastic and IID exponential with parameter µ > 0. The arrivals in Actual Request Process

A (t); t 0 are derived from P (t), where each arrival on P (t) is an arrival on A (t) with { ≥ } probability p independently. R ; i = 1, 2,... are IID Bernoulli (p) indicator random variables, { i } where Ri = 1 if the ith arrival on P (t) is an arrival on A (t). As a result, A (t) is a Poisson process with an average arrival rate λp. To avoid ambiguity, we denote an actual request with its index in

P (t) instead of A (t).

We assume that every request is for a different service, i.e., s = s , i = j. This i 6 j ∀ 6 assumption is motivated by many practical problems, e.g., 1) prefetching problems, where each prefetched object is usually considered to be specific for one user request, and 2) applications where data objects are highly dynamic, like live streaming, online gaming, sensing data, cloud computing, etc. With this model, we exclude the impact of popularity distributions of services and concentrate purely on the fundamental improvement achieved from proactive service with uncertain predictive information.

Predictions: At time 0, the server knows the sequence of services (s1, s2, s3,....) corre- sponding to the requests in P (t) , as well as the probability p. The server has no knowledge of the { } exact arrival epochs t , the realization indicators R , or the service time d . We consider two { i} { i} { i} proactive systems, as described below.

1) Genie-Aided System: The server observes potential arrival i and acknowledges the value of R at t , i = 1, 2,.... Notice that the server cannot observe the potential arrival at t if i i ∀ i

Ri = 0 in practice, which is why we name this system Genie-Aided. In this scenario, we consider

3Z+ = {1, 2, 3,...} in this paper.

53 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS the following prediction window4:

ΠG (t) = (P (t) + 1,P (t) + 2,P (t) + 3,...) (3.4) which contains the indices of future requests.

2) Realistic Proactive System: The server can observe arrivals in A (t) but not P (t), i.e., the server only observes an arrival at ti if Ri = 1. At time t > 0, the sequence of indices for future potential requests from the server’s viewpoint, or the prediction window, is:

ΠR (t) = (I (t) + 1,I (t) + 2,I (t) + 3,...) (3.5) where I (t) is defined as:

I (t) max i t < t, R = 1 (3.6) , { | i i } which contains all the requests after the latest actual request by t.

The prediction windows include all the requests that will possibly arrive based on the server’s observation on the arrival processes. As a result, the server should only proactively work on request i if i ΠG (t) in the Genie-Aided system and if i ΠR (t) in the Realistic Proactive ∈ ∈ system. The fundamental difference between the Genie-Aided system and Realistic Proactive system centers on the latest request they can observe, i.e., P (t) in the Genie-Aided system and I(t) in the Realistic Proactive system. Our models have the advantage that we can tractably characterize uncertainties in arrival epochs, service time, and realizations of potential requests.

3.3.2 Service Model

We start by defining two service schemes in the system.

4We assume the prediction window size to be infinite for simplicity of analysis. This assumption ensures the server to always have proactive service to process.

54 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Reactive Scheme: The server provides service only to the actual requests in A (t) in a reactive manner based on strategy ΨR as described below. Upon observing an actual request i at time ti, the service si is placed into the tail of a FIFO queue waiting to receive service. Let

N (t ) 0, 1, 2,... denote the number of unfinished services in the system. The service time of i ∈ { } one service in N (ti) follows IID Exp(µ), as we previously assumed. If N (t) = 0, the system is idle at t.

Proactive Scheme: The server can proactively perform service of a request and store the corresponding results in a cache. Since our focus is on the effects of uncertain predictions, we ignore the limit on caching space for simplicity.

We assume reactive services have a higher priority proactive services, so a proactive service can be interrupted by reactive services. We assume the remaining proactive service time d0 R+ to i ∈ finish service s is stochastic and follows IID Exp(µ). Let U (t) 0, 1 be an indicator such that i i ∈ { }

Ui(t) = 1 if request i has been served proactively completely by time t and Ui(t) = 0 otherwise. Let

U 0, 1 be an indicator such that U = 1 if request i is eventually served proactively completely i ∈ { } i and Ui = 0 otherwise. If Ui (ti) = 1, request i is fulfilled immediately upon its arrival. Otherwise, request i needs to be served reactively with service time di if realized.

Here we define general proactive strategies. For convenience, we use Π(t) to denote the prediction window for both the Genie-Aided system and the Realistic Proactive system. Define

U (t) U (t); i Π(t) . At time t, based on U (t), the prediction window Π (t) and the queue , { i ∈ } size N (t), a proactive strategy ΨP at the server, for both the Genie-Aided system and the Realistic

Proactive system, is defined as:

Ψ(N (t) , Π (t) , U (t)) = ρ (t) , ρ (t): i Π(t) { N i ∈ }

55 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

where ρN (t) is an indicator where ρN (t) = 1 if the services in N (t) are being served and ρN (t) = 0 otherwise. ρ (t); i Π(t) are indicators where ρ (t) = 1 if the server is proactively work on { i ∈ } i service si at time t and ρi (t) = 0 otherwise. As assumed before, the reactive queue N (t) has higher

P priority than proactive traffic. That is, if N (t) > 0, then i∈Z+:i∈Π(t) ρi = 0. Thus, we consider the set ΓP of proactive strategies ΨP satisfying:

(Reactive State) If N (t) > 0: •

X ρN (t) = 1, ρi = 0 (3.7) i∈Z+:i∈Π(t)

(Proactive State) If N (t) = 0: •

X ρN (t) = 0, ρi = 1 (3.8) i∈Z+:i∈Π(t)

The limiting fraction of potential requests that is finished proactively • PP (t) i=1 Ui U , lim (3.9) t→∞ P (t)

exists for ΨP ;

The limiting fraction of actual requests that is finished proactively • P + i∈Z :i≤P (t),Ri=1 Ui UA , lim (3.10) t→∞ A (t)

exists for ΨP .

One may notice that equations (3.9) and (3.10) are defined based on P (t), which more naturally relates to the prediction window ΠG (t) in Genie-Aided system. However, we can have an equivalent definition by replacing all P (t) terms with I(t), corresponding to the Realistic Proactive system. Due to space limitations, please refer to Appendix B.1 for the proof.

56 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.3.3 Problem Formulation

3.3.3.1 Maximization of the Limiting Fraction of Proactive Work

Given λ, µ and p, our first optimization problem can be formulated as:

maximize U (ΨP ) (3.11) ΨP

subject to Ψ Γ P ∈ P where ΓP is defined in (3.7)-(3.10). We are going to show that the solution to the maximization problem exists, and we will design the optimal strategy that achieves the maximum U. Let Ψ∗ be an

∗ ∗ optimal solution to problem (3.11) and let U max , U (Ψ ) denote the U achieved by Ψ .

Operating Regimes: For particular ranges of values for λ, µ, and p, the solution to (3.11) is known, First, when 0 λ < µ, we know U = 1, w.p.1 by Corollary 2 in [50] and Theorem 2 ≤ max in [51]. Briefly speaking, the server can proactively serve every request before its arrival epoch with probability 1 in our context. Second, when λ µ , the overall arrival rate of the actual services is ≥ p beyond the stability region of the system. Moreover, according to [56], the stability region of the system is not enlarged with full knowledge of the future. Thus, the server cannot stabilize the queue

N (t) in this region, which means that the server is always working reactively with probability 1.

Thus, we have the following:   1, w.p.1 if 0 λ < µ  ≤   U max = U (Ψ∗) , if µ λ < µ (3.12)  ≤ p    0, w.p.1 if µ λ  p ≤

57 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Algorithm 3 FIXP(φ) Strategies Table of Notations 1: Main Procedure SYSTEM RUN(φ) N (t) Unfinished reactive service at server node 2: Choose the threshold as φ; µ Average service rate 3: Initialize N(t), Π (t) P (t) Potential request process 4: while t > 0 do A (t) Actual request process 5: if Genie-Aided and potential request i hap- λ Average arrival rate of P (t) pens at t then p Probability that each potential request is re- 6: Update prediction window Π (t) alized 7: end if ti Arrival epoch of potential request i 8: if Actual request i arrives at t then Ri Indicator random variable for whether re- 9: Put si into the tail of the service queue quest i is realized N(t). di Service time of request i 10: Update prediction window Π (t) Ui Indicator random variable for whether re- 11: end if quest i is proactively served eventually 12: % Reactive work Ui (t) Indicator random variable for whether re- 13: if N(t) > 0 then quest i is proactively served by t 14: Work on the service at the head of Π (t) Prediction window (general) N(t) ΠG (t) Prediction window in the Genie-Aided sys- 15: end if tem 16: % Proactive work ΠR (t) Prediction window in the Realistic Proac- 17: if N(t) = 0 then tive system 18: J (t) ← min{i ∈ Π(t)|Ui(t) = I (t) Index of the latest actual request before 0&Ci = 1} time t 19: % J (t) is the earliest unfinished po- J (t) Index of the request to proactively serve at tential request in Π(t) decided to be proactively t served U Limiting fraction of services completed 20: Proactively work on request J(t) proactively 21: end if UA Limiting fraction of actual services com- 22: end while pleted proactively 23: End Procedure

Table 3.1: Table of Notation

58 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.3.3.2 Minimization of Delay in Fixed-Probability Strategies

In order to study the maximization problem in (3.11), we study a family of probabilistic proactive strategies, namely the fixed-probability (FIXP) strategies. In the FIXP strategy, the server may not proactively work on all requests in the prediction window, where each prediction will receive proactive service with probability φ. Define φ (0, 1] as the probability parameter, and ∈ denote the corresponding strategy as FIXP(φ). Define a decision sequence C = (C1,C2,C3,...) of IID Bernoulli(φ) indicator random variables, where Ci = 1 means the server will proactively work on request i if possible. When working proactively, the server works on request J (t) at time t, where J (t) = min i Π(t) U (t) = 0 and C = 1 is the first unfinished request in the { ∈ | i i } prediction window Π (t) indicated in C. FIXP(φ) strategies are described in detail in Algorithm 3.

In Section 3.5 and 3.6, we will show that the FIXP strategy with a specific φ value is optimal to both

(3.11) and (3.14).

Define delay Di, i = 1, 2,... of an actual request i as the time between its arrival epoch ti and the epoch the server finishes service si. Define the limiting average delay D as:

P i∈Z+:i≤P (t),R =1 Di D = lim i (3.13) t→∞ A(t)

The delay minimization problem of the FIXP strategies is formulated as:

minimize DFIXP (φ) (3.14) φ∈(0,1] where DFIXP (φ) is the limiting average delay under the FIXP(φ) strategy.

59 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.4 Relation between Reactive Scheme and Proactive Schemes

Proactive service makes use of available resources when the system is idle (under the reactive scheme). A natural question is what fraction of the requests can be completed proactively at most. We can gain some insights from an example, as shown in Figure 3.3, where the reactive scheme, the Genie-Aided system, and the Realistic Proactive system are compared.The system runs from time 0 to t. Potential requests 1 to 7 arrive at t1 to t7 respectively during this period of time, with all potential requests realized except for request 4. The evolution of unfinished work N (t) under the reactive scheme is in blue. The evolution in the Realistic Proactive Scheme with the EDF strategy is in red. The evolution in the Genie-Aided Proactive Scheme is in green. We also plotted corresponding rate allocation and service time.

The idle periods in the reactive scheme are utilized for proactive services in proactive schemes. As a result, some requests are completely proactively served before their arrivals (i.e., 5 in both proactive schemes), thus reducing the actual workload of these requests when they actually arrive. In return, more proactive service can be done, and more actual workload is reduced (i.e., 6 in Genie-Aided system). We can also see the difference between the Genie-Aided system and the

Realistic Proactive system (i.e., the proactive service for unrealized request 4), caused by different observation capabilities of the server.

To study how much proactive work can be done in a general proactive service system, we compare the sample paths in the reactive scheme and the proactive systems. Consider a set of sample paths corresponding to arrival epochs t : i Z+ , realizations R = z : i Z+ (z 0, 1 ), { i ∈ } { i i ∈ } i ∈ { } reactive service time d : i Z+ and proactive service time d0 : i Z+ under the reactive { i ∈ } { i ∈ } scheme, the Genie-Aided system, and the Realistic Proactive system. Notice that based on our

60 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Figure 3.3: Comparisons between reactive and proactive systems

assumptions on the memoryless service process, the service time of an actual request that is not completely served proactively is still distributed according to Exp(µ) independent of all other service times. For convenience of comparisons, we choose the reactive service time d : i Z+ to be the { i ∈ } same for the same service among different schemes.

We make the following definitions. The amount of time that Ψ Γ works in the P ∈ P proactive state (namely Proactive Proactive) from 0 to t is:

T (t) τ (0, t]: N (τ) = 0 (3.15) PP , |{ ∈ }|

The amount of time that Ψ Γ works in reactive state (namely Proactive Reactive) from 0 to t is: P ∈ P

T (t) τ (0, t]: N (τ) > 0 (3.16) PR , |{ ∈ }|

The limiting fraction of time that Ψ Γ works in the reactive state and in the proactive state, P ∈ P

61 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS respectively, are:

TPR (t) TPP (t) αPR , lim , αPP , lim (3.17) t→∞ t t→∞ t

Before we continue to study the relation between the reactive scheme and the proactive scheme, we first define two important properties of proactive strategies.

Definition 4 (Property 1 of Proactive Strategies). A proactive strategy Ψ Γ satisfies Property 1 P ∈ P if the following condition is satisfied:

P i∈Z+:i>P (t) Ui (t) lim = 0, w.p.1 (3.18) t→∞ t

P The term i∈Z+:i>P (t) Ui (t) represents the total amount of proactive work done for future requests at t. Although this part of proactive work may be requested eventually, it does not contribute to reducing reactive work by time t. Therefore, an intuitive interpretation of Property 1 is that the amount of proactive work for future requests should not scale with t.

Proposition 5. For all Ψ Γ , we have P ∈ P

U U , w.p.1 ≥ A

We omit the proof due to its similarity to the proof of Proposition 1. We then have the following definition of the second property:

Definition 5 (Property 2 of Proactive Strategies). A proactive strategy satisfies Property 2 if the following condition is satisfied:

UA = U, w.p.1 (3.19)

62 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Proposition 5 implies that in our setting, the fraction of actual requests that are finished proactively is no greater than the fraction of potential requests that is finished proactively. In the

Genie-Aided system, Property 2 is always satisfied because J(t) > P (t) by definition. In other words, the request being proactively worked on has not arrived in the potential request process so that it will be realized with probability p. A rigorous proof can be found by checking the equality condition in the proof for Proposition 5. However, Property 2 is not necessarily satisfied in the

Realistic Proactive system. This is because the server can only observe A (t) , so it may proactively { } serve requests which have arrived in P (t) but not realized. { } With Property 4 and 5, we are able to derive the following theorem.

Theorem 5. Given µ, λ and p satisfying µ λ < µ and p < 1, the limiting fractions of time that ≤ p the server works in the proactive state and the reactive state, respectively, under Ψ Γ satisfy P ∈ P

µ λp (λ µ) p α − , w.p.1, α − , w.p.1 PP ≤ µ (1 p) PR ≥ µ (1 p) − −

Equality holds in both inequalities if and only if the proactive strategy ΨP satisfies both Property 1 and Property 2.

Proof. Please see proof in Appendix B.2.

Theorem 5 reveals an important fact that under our model: the limiting fraction of time that the system works proactively has an upper bound µ−λp (0, 1], if λ µ > λp and p < 1. µ(1−p) ∈ ≥ Based on Theorem 5, we can derive the upper bound on the limiting fraction of requests that are served proactively in the following corollary.

63 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Corollary 6. Given µ, λ and p satisfying µ λ < µ and p < 1, the limiting fraction of potential ≤ p requests that is finished proactively under strategy Ψ Γ satisfies P ∈ P

µ pλ U − , w.p.1 (3.20) ≤ λ (1 p) −

Equality holds if and only if the proactive strategy ΨP satisfies both Property 1 and Property 2.

Proof. Please see proof in Appendix B.3.

Recall that our objective in (3.11) is to find a proactive strategy that maximizes U. Based on Corollary 6, we will find a proactive strategy that achieves the upper bound in (3.20), which will

µ−λp be a solution to (3.11). The upper bound λ(1−p) is a crucial value that will appear in several essential conclusions in the analysis below.

3.5 Fixed-Probability (FIXP) Strategy in the Genie-Aided Proactive

System

In order to construct an optimal proactive strategy to solve the optimization problem of

(3.11), we study a family of fixed-probablity (FIXP) strategies, as defined in Section 3.3.3.2. We first look into a Genie-Aided system, where the server can observe all the potential arrivals in P (t) . We { } study the FIXP proactive strategies’ asymptotic behaviors by formulating a Markov Process and the corresponding embedded Markov Chain. We will derive the relationship between the FIXP strategies and the two properties, and then construct an optimal solution to the problem (3.11) and (3.14) by choosing a specific parameter φ for the FIXP proactive strategies.

64 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.5.1 Markov Process of the Genie-Aided System under FIXP(φ)

We first discuss the Markov process for the Genie-Aided System under FIXP(φ) strategy.

P Consider the Markov process (M (t) ,N (t)) ; t 0 , where M (t) , ∈Π t Ui(t) denotes { ≥ } i G( ) the number of services completed proactively for future requests at t, and N(t) is the number of unfinished actual requests in the system at t, as defined before. We analyze the Markov process by constructing an embedded Markov Chain, where the states and transition epochs are defined as follow:

Definition 6 (The Markov Process under FIXP(φ) in Genie-Aided System). Let (τ0 = 0, τ1, τ2, . . . , τk,...) be the sequence of transition epochs, where each τk, k = 1, 2,... satisfies one of the following condi- tions: 1) i Z+ such that τ = t , 2) N(τ −) N(τ +) = 1, or 3) J(τ −) = J(τ +). The embedded ∃ ∈ k i k − k k 6 k Markov Chain (M ,N ): k = 0, 1,... with state space S (m, n): m = 0, 1, . . . , n = 0, 1,... { k k } G , { } of the Markov process (M (t) ,N (t)) : t 0 is defined as: { ≥ }

X M = M(τ ) = U (τ ) ,N = N(τ ), k = 0, 1,... (3.21) k k i k k k ∀ i∈ΠG(τk)

Holding Time between Transitions: Based on conditions for choosing transition epochs in Definition 6, a transition in the embedded Markov Chain happens when a potential arrival happens or when the server completes a reactive or proactive service. Notice that a transition in the embedded

Markov Chain does not mean Mk or Nk changes, i.e., self-loops exist. The overall arrival rate of potential requests is λ, and the overall service rate is µ independent of system states. Therefore, the overall transition rate v of state (m, n), m = 0, 1,..., n = 0, 1,... is: (m,n) ∀ ∀

v = λ + µ, m = 0, 1,..., n = 0, 1,... (3.22) (m,n) ∀ ∀

65 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Define the holding time between transitions as ∆ τ τ , k = 0, 1,..., and we have: k , k+1 − k

∆ Exp(λ + µ), k = 0, 1,... (3.23) k ∼ ∀ with:

1 E[∆ ] = , k = 0, 1,... (3.24) k λ + µ ∀

Therefore, the holding time between intervals is positive with a finite expectation. As one may already notice, the benefit of choosing such transition epochs is that all states have the same transition rates of λ + µ, thus saving the effort for uniformization. As a result, the holding time between consecutive transitions follows IID Exp(λ + µ) for every transition. Because the distribution of holding time is exactly the same for all states, the limiting distribution of the Markov process is exactly the same as the embedded Markov Chain numerically.

Transition Rates of the Markov Process: First of all, one crucial fact on the transition probability is described in the following lemma.

Lemma 2. Under the FIXP(φ) strategy in the Genie-Aided system, the probability that the next future potential request has been fulfilled given that some future services have been fulfilled satisfies:

P U (τ ) = 1 M = m = P C = 1 = φ, m Z+ (3.25) P (τk)+1 k | k P (τk)+1 ∀ ∈

Proof. Please see proof in Appendix B.4.

Lemma 2 means that if there is a positive number of proactive services completed for future requests at t, the probability that the next potential arrival has been proactively served is φ.

66 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Now we describe the transition rates in the Markov Process of the Genie-Aided system.

 Let R(m0,n0),(m1,n1) denote the one-step transition rates, where:

R = v P M = m ,N = n M = m ,N = n , m , m , n , n = 0, 1,... (m0,n0),(m1,n1) (m0,n0) { k+1 1 k+1 1| k 0 k 0} 0 1 0 1 (3.26)

Given system parameters µ, λ, p, and φ, we derive the transition rates of the embedded Markov

Chain, as follow:

Horizontal Directions: •

R = λp, n 0 (3.27) (0,n),(0,n+1) ≥

R = λp (1 φ) , m 1, n 0 (3.28) (m,n),(m,n+1) − ≥ ≥

R = µ, m 0 n 1 (3.29) (m,n),(m,n−1) ≥ ≥

(3.27) is the scenario where no proactive work has been completed in the system at t, so the arrival rate is λp. (3.28) is the scenario where M Z+ future requests have been proactively served, so k ∈ the arrival rate is λp (1 φ) according to (3.25). (3.29) corresponds to the service process which − serves actual requests at the rate of µ.

Vertical Directions: •

R = µ, m 0 (3.30) (m,0),(m+1,0) ≥

R = 0, m 0, n 1 (3.31) (m,n),(m+1,n) ≥ ≥

R = λφ, m 1, n 0 (3.32) (m,n),(m−1,n) ≥ ≥

(3.30) corresponds to the proactive service rate µ when the system has no reactive work, i.e., Nk = 0.

(3.31) is because the server does not work proactively when there is reactive work to be finished,

67 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

i.e., Nk > 0. (3.32) corresponds to the case when an potential arrival is proactively served, causing a decrement in Mk, the number of future requests proactively served. The rate of this occurrence is λφ, where λ is the arrival rate of a potential request, and φ is the probability that this arrival is proactively served according to (3.25).

Self-loops: •

R = λ(1 p), n 0 (3.33) (0,n),(0,n) − ≥

R = λ(1 p)(1 φ), m 1, n 1 (3.34) (m,n),(m,n) − − ≥ ≥

(3.33) corresponds to the rate of potential arrivals which are not realized. (3.34) corresponds to the rate of potential arrivals which are not realized and not proactively served.

Other Transitions: •

R(m0,n0),(m1,n1) = 0, otherwise (3.35)

For simplicity, we define λ λp, λ λp (1 φ), λ λφ. The Markov process and its transition 1 , 2 , − 3 , rates can be shown as Figure 3.4. We omitted the self-loops in this figure for simplicity. Based on (3.26), the transition probabilities of the embedded Markov Chain can be achieved from the

corresponding transition rate. Because v(m0,n0) are the same for all states according to (3.22), the transition probabilities are proportional to the transition rates.

68 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

λ1 λ1 λ1 λ1 λ1 (0, 0) (0, 1) (0, 2) ... (0, n) ... µ µ µ µ µ

λ3 µ λ3 λ3 λ3

λ2 λ2 λ2 λ2 λ2 (1, 0) (1, 1) (1, 2) ... (1, n) ... µ µ µ µ µ

λ3 µ λ3 λ3 λ3

λ2 λ2 λ2 λ2 λ2 (2, 0) (2, 1) (2, 2) ... (2, n) ... µ µ µ µ µ

λ3 µ λ3 λ3 λ3

......

λ3 µ λ3 λ3 λ3

λ2 λ2 λ2 λ2 λ2 (m, 0) (m, 1) (m, 2) ... (m, n) ... µ µ µ µ µ

λ3 µ λ3 λ3 λ3

......

Figure 3.4: Embedded Markov Chain of the Markov Process under FIXP Strategies in the Genie-

Aided System

3.5.2 Recurrence Analysis of the Embedded Markov Chain

It is straightforward to see that this Markov Chain is an aperiodic irreducible Markov Chain with countable states. To derive the recurrence conditions, we try to solve the following equations:

(λ1 + µ) π(0,0) = µπ(0,1) + λ3π(1,0) (3.36)

(λ + µ) π = λ π + µπ + λ π , n Z+ (3.37) 1 (0,n) 1 (0,n−1) (0,n+1) 3 (1,n) ∈

(λ + λ + µ) π = µπ + µπ + λ π , m Z+ (3.38) 2 3 (m,0) (m−1,0) (m,1) 3 (m+1,0) ∈

(λ + λ + µ) π = λ π + µπ + λ π , m Z+, n Z+ (3.39) 2 3 (m,n) 2 (m,n−1) (m,n+1) 3 (m+1,n) ∈ ∈

69 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

P P with the unity constraint of m=0,1,... n=0,1,... π(m,n) = 1. Our solutions are shown in the following proposition:

Proposition 6. Given µ, λ and p satisfying µ λ < µ and p < 1, the embedded Markov Chain in ≤ p µ−λp Definition 6 is positive recurrent if and only if φ > λ(1−p) .

µ−λp When φ > λ(1−p) , the steady-state distribution of the embedded Markov Chain can be expressed as:

(µ λp)(µ λp λφ + λpφ) π = − − − (3.40) (0,0) λµ(p 1)(p + φ pφ) − −  λ λ + λ λ λ n µλ  λ n π = 1 2 1 3 1 2 2 π , (0,n) λ λ + λ λ µλ µ − λ λ + λ λ µλ λ + λ (0,0) 1 2 1 3 − 2 1 2 1 3 − 2 2 3 n = 0, 1,... (3.41)  m  n µ λ2 π(m,n) = π(0,0), λ2 + λ3 λ2 + λ3

m Z+, n = 0, 1,... (3.42) ∈

Proof. According to Theorem 6.22 in [59], the Markov Chain is positive recurrent if and only if

π 0; m = 0, 1, . . . , n = 0, 1,... which satisfies (3.36-3.39) and P P π = ∃ (m,n) ≥ m=0,1,... n=0,1,... (m,n) 1.

µ−λp  When φ > λ(1−p) , the solutions π(m,n); m = 0, 1, . . . , n = 0, 1,... are expressed in

(3.40-3.42). When φ µ−λp , all the non-trivial solutions5 satisfying (3.40-3.42) lead to an infinite ≤ λ(1−p) sum, i.e., P P π = , which cannot be normalized. Therefore, the Markov m=0,1,... n=0,1,... (m,n) ∞ Chain is non-positive, i.e., null recurrent or transient.

Due to the complexity of proving null-recurrence of the Markov Chain in certain scenarios, we do not rigorously distinguish the null-recurrent case from the transient cases. Notice that the

5 Non-trivial solutions in our context is a set of solutions which satisfy ∃m = 0, 1,..., ∃n = 0, 1,... : π(m,n) > 0;

70 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS arguments in the rest of this paper will not depend on null-recurrence or transience of the Markov

Chain.

We have the knowledge of the steady-state distribution of the Markov Chain in Propo-

µ−λp sition 6, which is only for φ > λ(1−p) . In order to have a more thorough understanding of the asymptotic behavior of the Genie-Aided system, specifically the limiting fraction of time the Markov

Chain is in each row or column, we will analyze the Markov Chain from the point of view of rows and columns in the following.

3.5.2.1 Analysis of the Rows of the 2D Markov Chain

r P∞ We revisit the equation (3.41), and define πm = n=0 π(m,n) as the sum of the row m of

π , m = 0, 1,.... The exponential terms λ2 and λ1 are both in (0, 1), therefore we have (m,n) λ2+λ3 µ the following equations:

 λ λ + λ λ  µ  µλ λ + λ  πr = 1 2 1 3 2 2 3 π (3.43) 0 λ λ + λ λ µλ µ λ − λ λ + λ λ µλ λ (0,0) 1 2 1 3 − 2 − 1 1 2 1 3 − 2 3  m   r µ λ2 + λ3 πm = π(0,0), m = 1, 2,... (3.44) λ2 + λ3 λ3

P∞ r and we are trying to find a solution that satisfy unity sum m=0 πm = 1.

We have the following analysis:

µ−λp P∞ r 1. When φ > λ(1−p) , a solution satisfying m=0 πm = 1 can be directly achieved by using the

solutions to the 2D Markov Chain, as shown in (3.40-3.42).

2. When φ µ−λp , we have µ 1 which makes πr πr , m = 1, 2,.... As a ≤ λ(1−p) λ2+λ3 ≥ m+1 ≥ m ∀ P∞ r result, there is no non-trivial solution satisfying m=0 πm = 1. This implies that Mk ,

PK k=0 Mk lim →∞ = , w.p.1, which is the reason for non-positivity of the Markov Chain. K K ∞

71 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.5.2.2 Analysis of the Columns of the 2D Markov Chain

c P∞  Similarly, we define πn = m=0 π(m,n) as the sum of the column n of π(m,n) , n =

0, 1,.... Similar to the previous section, distribution on the columns can be achieved from the

µ−λp steady-state distribution in Proposition 6 when φ > λ(1−p) .

c However, the term µ 1 if φ µ−λp . As a result, πn = which makes it λ2+λ3 ≥ ≤ λ(1−p) π(0,0) ∞ difficult solve the problem based on the solutions in (3.40-3.42). Therefore, we are going to use a new method to study the limiting fraction of time the system spends in each column when φ µ−λp . ≤ λ(1−p) Based on Proposition 6, the Markov Chain is non-positive when φ µ−λp . Consider a ≤ λ(1−p) 1D Markov Chain X ; k = 0, 1,... , as defined and shown in Figure 3.5. We can analyze transitions { k }

λ2 λ2 λ2 λ2 λ2 0 1 2 ... n ... µ µ µ µ µ

Figure 3.5: Corresponding 1D Markov Chain X { k}

between the columns of the 2D Markov Chain by using this 1D Markov Chain when φ µ−λp . ≤ λ(1−p) Specifically, we describe the relationship in the following proposition:

Proposition 7. Given µ, λ and p satisfying µ λ < µ and p < 1, the limiting distribution of the ≤ p 2D Markov Chain (M ,N ) and the 1D Markov Chain X satisfies: { k k } { k}

PK 1 (N = n) PK 1 (X = n) µ λ λ n lim k=0 k = lim k=0 k = − 2 2 , n = 0, 1, . . . , w.p.1 K→∞ K K→∞ K µ µ ∀

(3.45) when φ µ−λp . ≤ λ(1−p)

Proof. Please see proof in Appendix B.7.

72 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

From Proposition 7, we achieve the limiting distribution of states in terms of columns in the 2D Markov Chain of the Genie-Aided system. The finite value for limiting average of Nk implies that the horizontal transitions are not the reason for non-positivity of the Markov Chain. The limiting distribution of the Markov Process can be achieved from the results in Proposition 6 and 7 on the embedded Markov Chain This result is significant because the column N(t) represents the number of services in the system at t, which directly relates to the proactive service time (Nk = 0) and relates to the average delay by Little’s Law.

3.5.3 Relationship between FIXP(φ) and the two Properties

Recall that Corollary 6 addresses the relationship between the upper bound on the limiting fraction of proactive service and the two crucial properties defined in Definition 4 and 5. Therefore, we will look into Property 1 and 2 of FIXP strategies to derive a solution to (3.11) in this section.

3.5.3.1 Property 1 of FIXP strategies in the Genie-Aided System

We describe the relationship of FIXP(φ) strategies with Property 1 with the following proposition.

Proposition 8. Given µ, λ and p satisfying µ λ < µ and p < 1, the FIXP(φ) strategy satisfies ≤ p Property 1 if and only if φ µ−λp . ≥ λ(1−p)

Proof. Please see the proof in Appendix B.5.

Property 1 means that the number of proactive service completed for future requests, i.e.,

M(t), should not scale with t. Based on the recurrence of the Markov Chain, Proposition 8 is

73 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS able to show that Property 1 is satisfied when the embedded Markov Chain is positive recurrent or

µ−λp φ = λ(1−p) .

3.5.3.2 Property 2 of FIXP strategies in the Genie-Aided System

In the Genie-Aided System, FIXP strategies always satisfy Property 2. The proof is straightforward based on the proof for Proposition 5, as we discussed right after Definition 5.

Therefore, the FIXP strategies satisfy both Property 1 and 2 if and only if φ µ−λp . ≥ λ(1−p) Then based on Corollary 6, we have the following theorem:

Theorem 6. Given µ, λ and p satisfying µ λ < µ and p < 1, the FIXP(φ) strategy is a solution ≤ p to the optimization problem (3.11) if and only if φ µ−λp . ≥ λ(1−p)

We provide a verification of Corollary 6 and Theorem 6 in Appendix B.6.

3.5.4 Delay Analysis of FIXP(φ) Strategies under the Genie-Aided System

Base on the limiting distribution of system states in Proposition 6 and Proposition7, we can apply Little’s Law to derive the closed-form expressions for average delay under FIXP strategies in the Genie-Aided system with the following theorem.

Theorem 7. Given µ, λ and p satisfying µ λ < µ and p < 1, the limiting average delay D of the ≤ p FIXP(φ) strategy in Genie-Aided system can be expressed as:    λp3−µp2−λp3φ+µp2φ+λpφ−µpφ µ−λp  λφp(µ−λp)(1−p) , φ > λ(1−p) D =  − −  1 φ , φ µ λp  µ−λp(1−φ) ≤ λ(1−p)

74 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

µ−λp The minimum delay Dmin is achieved if and only if φ = λ(1−p) with the expression:

λ µ D = − min λ(µ λp) −

Proof. Please see proof in Appendix B.8.

µ−λp Theorem 7 shows that the FIXP strategy with φ = λ(1−p) is a delay-optimal solution for

(3.14) in the Genie-Aided system. Therefore, we have derived optimal solutions to problems (3.11) and (3.14) by choosing specific φ values for FIXP strategies in the Genie-Aided system.

3.6 Fixed-Probability (FIXP) Strategy in Realistic Proactive System

Recall that the Realistic Proactive system can only observe the actual arrivals in A (t), where the prediction window is defined in (3.5). Compared with the Genie-Aided system, the server cannot update the prediction window at ti if request i is unrealized (Ri = 0). As a result, the Realistic

Proactive system may work on a request i at t, where P (t) i > I(t), meaning that request i is a ≥ potential request which has arrived but not realized. Such proactive service does not reduce future workload, thus degrading the performance of proactive services. In the following, we study the FIXP strategies in the Realistic Proactive system.

3.6.1 The Stochastic Process in the Realistic Proactive System under FIXP(φ)

Like the Genie-Aided system, we first analyze the underlying stochastic process of FIXP strategies under the Realistic Proactive system. We first define the states of this process with the following differences from the Genie-Aided system:

75 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

P 1. Define MR(t) , i∈Z+:i≥P (t) Ui(t) as the number of future requests which has been proac-

tively served at t. Notice that it is different from the definition M (t) P U (t) , i∈ΠR(t) i

because the requests I(t) + 1,I(t) + 2,...,P (t) Π (t) are not future requests but unreal- ∈

ized potential requests that arrived before t. Define NR(t) = N(t).

2. Instead of having a (0, 0) state, we consider two states (0, 0)A and (0, 0)B. Besides the

original meaning of (0, 0), (0, 0)A means that J(t) > P (t) in the system at t, i.e., the server

is proactively working on a future request. When the system is in state (0, 0)B, we have

J(t) P (t), i.e., the server is working on an unrealized potential request that has arrived. ≤

Based on the new states, we choose a sequence of epochs and observe the system states of the

Realistic Proactive system as follow:

Definition 7 (The Stochastic Process under FIXP(φ) in Realistic Proactive System). Let (τ0 = 0, τ1, τ2

, . . . , τ ,...) be the sequence of transition epochs, where each τ , k = 1,... satisfies: 1) i k k ∃ ∈ Z+ such that τ = t , 2) N(τ −) N(τ +) = 1, or 3) J(τ −) = J(τ +). We consider a se- k i k − k k 6 k quence of system states θ Θ: k = 0, 1,... with state space Θ (m, n): m = 0, 1,..., { k ∈ } , { n = 0, 1, . . . , m + n > 0 (0, 0) , (0, 0) of the stochastic process (M (t) ,N (t)) : t 0 , }∪{ A B} { R R ≥ } where the system states are defined as:   P θ = (M ,N ),M = + U (τ ) ,N = N(τ ), ifM + N > 0  k k k k i∈Z :i≥P (t) i k k k k k   θ = (0, 0) , ifM = 0,N = 0,J(τ ) > P (τ )  k A k k k k    θ = (0, 0) , ifM = 0,N = 0,J(τ ) P (τ )  k B k k k ≤ k One may notice that the system states are described in a similar manner to the Markov process in the Genie-Aided system. Unfortunately, the stochastic process we defined in the Realistic

76 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Proactive system is not entirely Markovian. Nevertheless, we can still analyze the stochastic process of the Realistic Proactive system by studying the probabilistic transitions of system states at selected epochs (τ0 = 0, τ1, τ2, . . . , τk,...). In contrast to the embedded Markov Chain of a Markov process, we call the model used here an embedded chain of the stochastic process. We will study the system by analyzing the holding time between transitions and the asymptotic behaviors of the embedded chain.

Holding Time: We notice that the selected transition epochs are exactly the same as the

Genie-Aided system. As a result, the conclusions on holding time in the Genie-Aided system, i.e.,

(3.23) and (3.24), still hold for the Realistic Proactive system. As a result, the limiting fraction of time that the stochastic process is in each state is the same as the limiting distribution of the embedded chain.

Transitions: Based on Definition 7, we describe the transitions of the system, especially the transitions involving the new states of (0, 0)A and (0, 0)B.

1. R(0,0)A,(1,0) = µ. Given the system is proactively working on a future request in state (0, 0)A,

the transition happens when a proactive work finishes, which is at rate µ.

2. R(1,0),(0,0)A = λ3. This transition happens at τk when a potential request arrives, i.e., P (τk)

by definition, which requests for the only completed proactive service in the system. At this

+ moment, P (τ ) is no longer a future request at τ , so M = M − 1 by definition. Given k k k k 1 −

that P (τk) was proactively served before τk, the current ongoing proactive service of request

J(τk), which already started before τk, must satisfy J(τk) > P (τk) by definition. Therefore,

J(τk) is for a future request so the system transits to state (0, 0)A.

3. R(0,0)A,(0,1) = λ1, R(0,0)B ,(0,1) = λ1. Given no proactive work has been done in the system,

77 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

the transition happens upon an actual arrival in the system with rate λ1.

4. R = a λ, a [0, 1], k = 0, 1,.... The transition happens when all the following (0,0)A,(0,0)B k k ∈ ∀ conditions are satisfied: 1) the request being proactively worked on is the next potential request

− − to arrive, i.e., J(τk ) = P (τk ) + 1, 2) it arrives before the server finishes proactively serving

it, and 3) it is unrealized. In this case, the server cannot observe the potential arrival and

continues to work on it proactively, so the Markov Chain transits from (0, 0)A to (0, 0)B at τk

by definition.

5. R = b µ, b [0, 1], k = 0, 1,.... As the reverse of the previous transition, the (0,0)B ,(0,0)A k k ∈ ∀

transition happens at τk when the server finished proactively working on an unrealized request

and starts to proactively work on J(τk) which satisfies J(τk) > P (τk).

6. R = c µ, R = (1 c )µ, c [0, 1], k = 0, 1,.... The transition (0,1),(0,0)B k (0,1),(0,0)A − k k ∈ ∀

happens at τk when the only reactive service in the system completes, and the server starts to

+ + + work on request J(τk ) proactively. When J(τk ) > P (τk ), it transits to (0, 0)A; otherwise, it

transits to (0, 0)B, by definition.

7. R = λ(1 p) + (1 b )µ, R = (1 a p)λ, which are self-loops. (0,0)B ,(0,0)B − − k (0,0)A,(0,0)A − k −

The transition rates of other states are the same as the Markov process in the Genie-Aided system.

The transitions of the stochastic process in the Realistic Proactive system is shown in Figure 3.6.

As one may notice, the transition rates among states (0, 0)A, (0, 0)B, and (0, 1) have parameters ak, bk, and ck, which do not only depend on the last state of the system but also the previous states.

Therefore, the stochastic process described in Definition 7 is not a Markov process. However, we

78 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

would like to point out that all the other transitions, besides the ones with ak, bk, and ck, only depend on the last state, so these transitions are still Markovian.

In order to analyze the system, we first use a closely related Markov Chain to derive some insights of the Realistic Proactive system. We replace ak, bk, and ck with constant parameters a (0, 1), b (0, 1), and c (0, 1), turning the chain into a time-invariant Markov Chain. In the ∈ ∈ ∈ following, we are going to analyze the asymptotic behaviors of the time-invariant Markov Chain with constant parameters a, b, and c.

(0, 0)B

akλ bkµ λ1 (aλ) (bµ)ckµ (cµ) λ1 λ1 λ1

(0, 0)A (0, 1) (0, 2) ... µ µ (1 ck)µ ((1− c)µ) − λ3 µ λ3 λ3

......

Figure 3.6: The transitions of the Realistic Proactive System. The corresponding time-invariant transition rates are shown in brackets.

79 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.6.2 Analysis on the Approximated Time-invariant Markov Chain

We first solve the following set of linear equations based on the time-invariant transition rates:

π(0,0) , π(0,0)A + π(0,0)B (3.46)

(λ1 + bµ) π(0,0)B = aλπ(0,0)A + cµπ(0,1) (3.47)

(λ + aλ + µ) π = bµπ + (1 c)µπ + λ π (3.48) 1 (0,0)A (0,0)B − (0,1) 3 (1,0)

(λ + µ) π = λ π + µπ + λ π , n Z+ (3.49) 1 (0,n) 1 (0,n−1) (0,n+1) 3 (1,n) ∈

(λ + λ + µ) π = µπ + µπ + λ π (3.50) 2 3 (1,0) (0,0)A (1,1) 3 (2,0)

(λ2 + λ3 + µ) π(m,0) = µπ(m−1,0) + µπ(m,1) + λ3π(m+1,0), m = 2, 3,... (3.51)

(λ + λ + µ) π = λ π + µπ + λ π , m Z+, n Z+ (3.52) 2 3 (m,n) 2 (m,n−1) (m,n+1) 3 (m+1,n) ∈ ∈

By solving the equations, we have the following proposition:

Proposition 9. Given µ, λ and p satisfying µ λ < µ and p < 1, the time-invariant Markov ≤ p Chain θ0 Θ: k = 0, 1,... of the Realistic Proactive system is positive recurrent if and only if { k ∈ } µ−λp φ > λ(1−p) .

80 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

µ−λp When φ > λ(1−p) , the steady-state distribution of the Markov Chain can be expressed as:

bµ + (1 c)λ1 π(0,0) = Aπ(0,0),A = − (0, 1) (3.53) A bµ + λ + aλ + c µλ2 ∈ 1 λ2+λ3 1 π(0,0) =   (3.54) µ + A λ2+λ3 µ + µλ2 µ µλ2 λ2+λ3 µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ2 λ n µλ λ n  λ n π = 1 π + 2 1 2 π , n = 0, 1,... (0,n) µ (0,0) λ λ + λ λ µλ µ − λ + λ (0,0)A 1 2 1 3 − 2 2 3 (3.55)

 µ m  λ n π = 2 π , m Z+, n = 0, 1,... (3.56) (m,n) (0,0)A λ2 + λ3 λ2 + λ3 ∈

Some crucial observations on Proposition 9 are:

The conditions for positive recurrence and non-positivity are exactly the same as the Genie-Aided •

system, implying that the new states (0, 0)A, (0, 0)B, and their related transitions do not change

the recurrence of the system under the current ranges for a, b, and c.

A crucial observation: (3.55) and (3.56) describe the steady-state distributions of all the other states • based on π and π . These results are derived from (3.49-3.52), which are independent of (0,0)A (0,0)

the approximated parameters a, b, or c. Even when using ak, bk, and ck, the relationship described

in (3.49-3.52) are still valid because the transitions of other states besides (0, 0)A, (0, 0)B are still

Markovian. We can take advantage of this fact to derive the limiting distribution of the states of

the original embedded chain in the Realistic Proactive system.

The parameters a, b, and c directly determine the relationship between π and π , and • (0,0)A (0,0)

consequently influence the limiting distribution. When using ak, bk, and ck, we try to use other

methods to derive the range for the limiting relationship between (0, 0)A and (0, 0).

81 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.6.3 Analysis on the Embedded Chain of the Realistic Proactive System

Define the limiting fraction of occurrence of state s Θ in the embedded chain as: ∈ PK k=1 1(θk = θ) fθ , lim , θ Θ, k = 0, 1,... (3.57) K→∞ K ∀ ∈ ∀

We consider the state (0, 0) as the combined state of (0, 0)A and (0, 0)B, which is significant in the following analysis. Here we directly assume f exists θ Θ without rigorously proving it. θ ∀ ∈ Based on the assumption, we have the following theorem on the limiting distribution of the embedded chain of the stochastic process in the Realistic Proactive system.

Theorem 8. Assume f exists θ Θ for the embedded chain defined in Definition 7. Define θ ∀ ∈ f A0 = (0,0)A . Given µ, λ and p satisfying µ λ < µ and p < 1, we have the following expressions f(0,0) ≤ p µ−λp for the limiting distribution of the states when φ > λ(1−p) :

1 f(0,0) =  , w.p.1 µ + A0 λ2+λ3 µ + µλ2 µ µλ2 λ2+λ3 µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ2 (3.58)

 µλ  λ n µλ  λ n f = 1 + A0 2 1 A0 2 2 f , (0,n) λ λ + λ λ µλ µ − λ λ + λ λ µλ λ + λ (0,0) 1 2 1 3 − 2 1 2 1 3 − 2 2 3 n = 0, 1, . . . , w.p.1 (3.59)

 µ m  λ n f = 2 f , m Z+, n = 0, 1, . . . , w.p.1 (3.60) (m,n) (0,0)A λ2 + λ3 λ2 + λ3 ∈

A0 (0, 1) (3.61) ∈

When φ µ−λp , we have ≤ λ(1−p)

f = 0, θ Θ, w.p.1 (3.62) θ ∀ ∈

Proof. Please see proof in Appendix B.9.

82 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

Theorem 8 shows the limiting distribution of the embedded chain for the Realistic Proactive system. Based on our conclusions on holding time, the limiting distribution of the stochastic process is also known.

3.6.4 FIXP Strategies and the Two Properties in Realistic Proactive System

First, we derive the relationship between the FIXP strategies in the Realistic Proactive system and the two properties in Definition 4 and 5.

Property 1 of the FIXP strategies in Realistic Proactive system: We can follow a similar proof for Proposition 8 based on the limiting distribution of the Realistic Proactive system.

We directly conclude that the FIXP(φ) strategy satisfies Property 1 if and only if φ µ−λp . ≥ λ(1−p) Property 2 of the FIXP strategies in Realistic Proactive system: We have the following proposition on Property 2 of FIXP strategies in the Realistic Proactive system.

Proposition 10. The FIXP(φ) strategies in Realistic Proactive system satisfies Property 2 if and only if φ µ−λp . ≤ λ(1−p)

Proof. Please see proof in Appendix B.10.

Notice that Property 2 of the FIXP strategies in the Realistic Proactive system is different from the Genie-Aided system. As a result, we have the following theorem on FIXP strategies in the

Realistic Proactive system based on Corollary 6:

Theorem 9. Given λ, µ and p, the FIXP(φ) strategy is a solution to the optimization problem (3.11)

µ−λp if and only if φ = λ(1−p) .

Similar to Genie-Aided system, we can verify Corollary 6 and Theorem 9 in Realistic

Proactive system, which we include in Appendix B.11.

83 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

3.6.5 Delay of FIXP strategies in the Realistic Proactive System

By comparing the average delay of FIXP(φ) strategies under the same φ in the Genie-Aided system and Realistic Proactive system, we have the following theorem on the limiting average delay.

Theorem 10. Given µ, λ and p satisfying µ λ < µ and p < 1, the limiting average delay in ≤ p

Genie-Aided system, DG, and Realistic Proactive system, DR, satisfy the following relationship:

µ λp D > D , if φ > − R G λ(1 p) − µ λp D = D , if φ − R G ≤ λ(1 p) − R The optimal delay Dmin in the Realistic Proactive system is:

R λ µ D = − min λ(µ λp) −

µ−λp which is achieved when φ = λ(1−p) .

Proof. Please see proof in Appendix B.12.

µ−λp Theorem 10 shows that the FIXP strategy with φ = λ(1−p) is the solution to the delay minimization problem in (3.14). An interesting observation is that the optimal delay in the Realistic

Proactive system is the same as the Genie-Aided system, by only observing actual arrivals.

3.7 Numerical Evaluation

We perform extensive experiments to study the limiting fraction of proactive work and delay performance of FIXP strategies. We randomly generate the inter-arrival time of the potential requests following IID Exp(λ), and each potential request is realized with probability p. We also

84 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS generate the service time of each request following IID Exp(µ) for both proactive and reactive services. Each time a proactive service is interrupted, we generate a new random value following

IID Exp(µ) to guarantee the memoryless property as we assumed in the paper. We choose µ = 10 in all of our simulations and generate 107 potential requests in each simulation for the average performance.

In each simulation, we gradually increase the threshold of φ from 0 to 1 and plot the corresponding limiting fraction of proactive work and average delay under FIXP(φ) strategies.

µ−λp Specifically, when φ = 1, the strategy becomes the EDF strategy; when φ = λ(1−p) , the strategy becomes the optimal FIXP strategy; and when φ = 0, the system operates in the reactive scheme.

λp is the arrival rate of the actual requests, which determines how heavily the network is loaded.

We choose λp = 6 as the lightly-loaded network scenario, and λp = 9.6 as the heavily-loaded network scenario. With each λp combination, we gradually increase λ from 10 to 20 and choose p correspondingly, to evaluate the impacts of prediction uncertainties on the limiting fraction of proactive work and delay performance.

Figures 3.7 and 3.8 show the limiting fraction of services which are done proactively. The x-axis represents the threshold of φ, which gradually increases from 0 to 1. The y-axis represents the fraction of proactive service, ranging from 0 to 1. Each curve corresponds to a (λ, p) combination with the same product λp. With the same product, smaller λ with larger p indicates better predictability of the predictions. The solid lines correspond to the limiting fraction of proactive services in the

Genie-Aided system, while the dashed lines correspond to the Realistic Proactive system. The curves with the same color correspond to the same (λ, p) combination. Each vertical dotted line represents

µ−λp the optimal thresholds λ(1−p) of FIXP strategies under each (λ, p) combination, which shares the

85 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS same color with the corresponding curve.

Figure 3.7: Limiting Fraction of Proactive Ser-Figure 3.8: Limiting Fraction of Proactive Ser- vices of FIXP Strategies: λp = 6 vices of FIXP Strategies: λp = 9.6

Here are some observations on the plots:

In the Genie-Aided system, each curve becomes flat after the optimal threshold of φ = µ−λp . • λ(1−p) The maximum limiting fraction of proactive services is achieved in the section no smaller than the

µ−λp optimal threshold of φ = λ(1−p) , which verifies the Theorem 6.

In the Realistic Proactive system, each curve has a unique maximum point perfectly indicated by • the optimal threshold, which verifies Theorem 9. Combining with the previous observation, it

shows the significance of Property 2 to a proactive strategy.

By comparing the curves of the same (λ, p) combination in the Genie-Aided system and Realistic • Proactive system, we see that the section smaller than the optimal threshold are the same in both

systems. The phenomenon is because the fraction of proactive work is limited by the φ, and the

amount of proactive work performed for the requests which have not arrived grows with t. It

shows why Property 1 is crucial to a proactive strategy.

By comparing curves of different (λ, p) combinations, we see that the maximum limiting fraction •

86 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

of proactive service and the optimal threshold increases with requests of higher predictability. This

implies that the precision of predictions is crucial to the performance of a proactive strategy.

Figures 3.9 and 3.10 show the delay performance of FIXP strategies, with different thresholds and different prediction uncertainties. The x-axis represents the threshold φ, which gradually increases from 0 to 1, and the y-axis represents the limiting average delay per actual request. We use the same settings for types and colors of curves as the previous figures. One difference is that we used crosses to plot the theoretical delay we derived in Theorem 7 for the

Genie-Aided system.

Figure 3.9: Average Delay of FIXP Strategies:Figure 3.10: Average Delay of FIXP Strategies:

λp = 6 λp = 9.6

Here are some interesting observations on the delay plots:

In Genie-Aided system, the theoretical delay overlaps with the simulated delay, verifying our • results in Theorem 7. The minimum delay is unique and is perfectly marked by the optimal

threshold for FIXP strategies.

In the Realistic Proactive system, the minimum delay is unique and perfectly marked by the • µ−λp optimal threshold for FIXP strategies. The section smaller than φ = λ(1−p) overlaps with the

87 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS

µ−λp curve in the Genie-Aided system, and the section greater than φ = λ(1−p) is greater than the

Genie-Aided system. Most importantly, the minimum delay is the same as the Genie-Aided system.

All of these observations support Theorem 10.

Comparing with Figures 3.7 and 3.8, we can see that delay decreases by having more proactive • service in the Realistic Proactive system. In the Genie-Aided system, although the limiting

fraction of proactive work is the same when φ µ−λp , the minimum delay is only achieved ≥ λ(1−p) µ−λp at φ = λ(1−p) . An intuitive explanation is that the proactive service is more evenly paced when

µ−λp φ = λ(1−p) , leading to smaller average delay.

By comparing different (λ, p) combinations, we see that the minimum delay decreases as the • predictability of the future requests increases.

By comparing two figures with different λp, we can see that the delay is much more sensitive in a • highly-loaded scenario. This addresses the significance of proactive strategies in a more congested

network system.

3.8 Summary

In this chapter, we proposed a more general proactive service model and introduced the potential request process to characterize the uncertainty in predictions. We characterized proactive strategies with two crucial properties, which significantly influences the performance of proactive strategies. By comparing the proactive system with the corresponding reactive system, we derived a tight upper bound on the limiting fraction of time that a proactive system works proactively with general proactive strategies and derived the relationship between the optimality conditions with

88 CHAPTER 3. DELAY-OPTIMAL PROACTIVE STRATEGY UNDER UNCERTAIN PREDICTIONS the two significant properties of proactive strategies. We looked into a family of FIXP strategies, where each potential request will be proactively worked on with the same probability. We considered both the Genie-Aided system, where all potential requests can be observed when they arrived in the potential process and the Realistic Proactive system, where the server can only observe actual arrivals. We achieved the optimal probability parameter for FIXP strategies in both systems and the closed-form expressions for the maximum limiting fraction of services that are completed proactively and the minimum limiting average delay per request under the optimal probability parameter. We also showed that the optimal delay of the Realistic Proactive system is exactly the same as the

Genie-Aided system under the same optimal FIXP strategy. In the end, we performed extensive numerical experiments, which provides a straightforward demonstration of the influence of the probability parameter on the performance of FIXP strategies. The numerical results verified our results on the limiting fraction of proactive work, the closed-form expressions of average delay and the optimal parameter we derived for FIXP strategies.

89 Chapter 4

SANDIE: SDN-Assisted NDN for

Data-Intensive Experiments

In this chapter, we introduce a design and implementation work of a joint caching and forwarding framework under the Named-Data Networking protocol for large-scaled data-intensive experiments, namely the SANDIE project. The SANDIE project is proposed to solve the networking challenges facing the data-intensive applications, especially for the Large Hadron Collider (LHC) [60] program. The ultimate goal is to implement the NDN architecture in the LHC network for efficient data caching, delivery, and distributed computations, together with content-centric security features of the NDN architecture.

We have started devoting our efforts as early as 2015 for some preliminary analysis and simulation work for the project. Due to the complexity of the data-intensive high-energy physics application, we spent quite an amount of time understanding the system at the first stage, including network structure, organization of datasets, workflows in the system, and statistics of request patterns.

90 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

After understanding the characteristics of the data-intensive experiments, we modified our VIP framework for joint caching and forwarding algorithms in NDN specifically for the application and performed experiments on local testbeds for performance verification and optimization. At the current stage, we have implemented two prototypes of VIP-based NDN forwarders and established a continental SANDIE testbed across the US, and the excellent performance of our implementation has been shown in a demonstration at SuperComputing 19’. We foresee that numerous performance optimizations will be implemented, and new features like multi-tier caching, proactive caching, and joint optimization for computation tasks will be integrated into the current implementation.

Specifically, proactive caching is an auspicious feature in this project due to the nature of analytical tasks in the high-energy physics application. First, the analytical tasks on related datasets are highly correlated, which implies high predictability of tasks in these applications. Second, the performance bottleneck is due to enormous data volume and heavy computation duties, which can be optimized with proactive services. Last, the distribution of datasets and allocations of computation tasks in the current system is not optimally designed. All of these facts indicate that our work in

Chapter 2 and 3 has excellent potential to improve the performance of these applications significantly.

Our contributions will be introduced in this chapter with the following structure:

In Section 4.1, we briefly introduce the background of this project, including data-intensive • applications, the Named-Data Networking Architecture, and the VIP framework for joint caching

and forwarding in NDN.

In Section 4.2, we first introduce the network structure of LHC networks and how CMS datasets • are distributed among LHC sites. Then we introduce the data systems which keep statistics on

CMS workflow patterns and dataset information, including ElasticSearch, PhEDEx, XrootD, and

91 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

DAS. We perform a thorough analysis of these statistics to understand the work patterns which

have a great impact on the network performance, and we will discuss our system-level decisions

on the design of NDN architecture and VIP framework in the LHC network based on our analysis.

Section 4.3 demonstrates the results we achieved from a preliminary simulation on VIP caching • performance in the Internet2 section of the LHC network topology, which was intended for

verification of the improvement we expected from NDN architecture in the application. Then we

show updated caching simulations based on the latest data we achieved from our analysis of CMS

workflow statistics.

In Section 4.4, we describe potential mismatches between the original design of the VIP framework • and the LHC applications. Accordingly, we discuss the specific modifications we did on the VIP

framework for the LHC application.

In Section 4.5, we introduce the latest progress on this project. Specifically, we introduce the • continental SANDIE testbed, the implementations of VIP in NDN Forwarding Daemon (NFD)

and NDN-DPDK, a DEMO we carried out in SuperComputing 2019, and some other planned

directions for the SANDIE project.

4.1 Introduction

4.1.1 Large-Scaled Data-Intensive Applications

Large-scaled data-intensive scientific experiments have emerged in numerous scientific

fields, including LHC program in High-Energy-Physics (HEP), Large Synoptic Survey Telescope

(LSST) [61] in astrophysics, the Earth System Grid [62] for climate science, the Joint Genome Insti-

92 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS tute applications and the BioGenome project[63][64] for biomedical research, etc. Such applications face challenges from enormous data volume, global data distribution, exceptionally high computation load, and the complexity in data management, storage, processing, and distribution. In this project, we focus on the Compact Muon Solenoid (CMS) experiments[65] of the LHC program for HEP as a pilot use case. The LHC program is one of the most extensive data-intensive experiments in the world. The total data volume is estimated to be 900 PBs, which is distributed globally at more than

170 sites. One of the most notable achievements of the LHC program is the discovery of Higgs Boson particles. The HEP experiments at LHC produce hundreds of PBs of data per year, and more than

840 PBs traffic was transmitted over the LHC network in 2015. The existing global LHC network

[66] have 13 Tier-1 nodes, which are usually national centers such as Fermilab, CERN, etc., 170

Tier-2 sites, which are usually regional universities and laboratories, and approximately 300 Tier-3 sites, which are located at universities and campuses supporting research of individual groups.

The LHC network technologies have been evolving to meet the challenges of increasing volumes of data and computational tasks. On the one hand, the network has evolved to a “location independent” access stage with XrootD protocol and applications [67], which means the network determines where to place and process the datasets and how to deliver the results to the researchers.

On the other hand, CMS experiments on LHC has been adopting a more compact data form called

“miniAOD” to record the specific quantities needed for certain tasks, which only takes about 30 KBs for a physics event. A newer format of “nanoAOD”, which only consists of 1-2 KBs per event, is under research. Despite this, significant challenges still face the LHC program. The traffic from this project is estimated to exceed 1.5 EBs in 2019, and it is still growing and is forecasted to be of another order of magnitude by 2027, according to [68]. The growth projections of the storage and

93 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS computation requirements show that the current system’s capabilities will be questionable for LHC

Run3 (2021-23). From [60], an upgraded High Luminosity LHC program adopts a finer luminosity, which records roughly 4-5 times more proton interactions, leading to a large volume of data. The computational tasks based on these data are also distributed among sites, which need support from the network to move the raw data and computation results around the LHC network. As one can see, it is a tremendous challenge for both hardware (bandwidth, storage, and computation power) and software (data management, placement, processing, and delivery), which demands a more flexible and efficient networking architecture to efficiently scale the processing capabilities, data organization, management, and distribution. The novel Named-Data Networking architecture has a great potential to be the solution due to its natural capabilities of distributed caching and network functionalities.

4.1.2 Named-Data Networking (NDN) Architecture

The Information-centric Networking(ICN) concepts are proposed for a novel network architecture which more naturally solves the problem of content distribution fundamentally. Named-

Data Networking(NDN)[69] is such an architecture that concentrates on the data contents instead of addresses like traditional IP networks. The paradigm shift addresses the current network usage more accurately and is expected to utilize new schemes to decrease , increase utilization of in-network storage resources, enhance content-centric security, and achieve other desirable performance.

Content names are the principal identifier that enables the major network functions in NDN.

Two types of packets are used in NDN to perform content retrieval: Interest Packets(IPs) and Data

Packets(DPs). Data contents in the network are divided into chunks, where each will be assigned a globally unique name. When a request enters the network, Interest Packets will be generated, where

94 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS each carries the corresponding name of a data chunk of the requested content. Upon receiving an

Interest Packet, a node first checks the name carried in the IP and looks into its Content Store(CS), the local storage space used for caching, to see if the corresponding data chunk is locally cached. If the corresponding data chunk is cached locally, a DP will be generated containing the data chunk and its name. Otherwise, the local Pending Interest Table(PIT), which keeps records of all the IPs forwarded from this node that are still unfulfilled, will be checked. Each PIT entry records the pending data chunks’ names, the corresponding IPs’ incoming interface, and the nonce values. If there is currently no entry in the PIT for the same data chunk, the Interest Packet will be forwarded according to

Forwarding Information Base(FIB) to the right output interfaces. Otherwise, if there is an existing

PIT entry for the same data chunk, this Interest Packet will be recorded in PIT and discarded. When an Interest Packet arrives at a node that is the source node or a node with the corresponding data chunk in its cache, the DP containing the name of the data chunk, the data chunk itself, the signature for security use, and the random nonce is created. Data Packets will take the exact reverse paths of the corresponding Interest Packets by using the information store in PITs. The node checks PIT for entries with the name in the DPs, sends a copy of the Data Packet onto each interface recorded in

PIT entries, and then evicts all the PIT entries under this name. Meanwhile, the caching strategy determines if this data chunk should be cached locally or not and which content to be evicted if the cache space is full. With these settings, Interest Packets can be fulfilled not only from the source nodes but also from any caching nodes they reach. Meanwhile, requests for the same contents during the same time will not generate redundant traffic, thus reducing traffic in the network. On top of this architecture, we need a strategy to utilize the capabilities of NDN in caching, forwarding, and congestion control optimally on data content, where the VIP framework [70] can be a prominent

95 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS potential solution.

4.1.3 VIP: Optimized NDN Caching, Forwarding, and Congestion Control

Based on NDN architecture, we need an algorithm that achieves outstanding network performance, which can handle caching and forwarding strategies for data-intensive applications. The

VIP framework for joint optimization of caching, forwarding, and congestion control is one promising solution, which comprehensively solves caching decisions, caching replacement, forwarding, and congestion control in the NDN architecture.

One of the difficulties in the design of NDN caching and forwarding algorithms is the

Interest aggregation as a function of PITs. It has the advantage of reducing traffic from the requests for the same data content, but an inner node can only observe the aggregated interest packets without knowing the real demand level behind it. The VIP framework takes advantage of the virtual plane which operates on virtual interest packets (VIP). The virtual plane does not transmit real data content, but it keeps counters as user demand metrics. A critical assumption in the virtual plane is that it does not aggregate VIPs. Therefore the VIP counts represent the actual local demand for specific content.

VIPs are created along with exogenous requests for data objects in the actual plane, which deals with real IPs and DPs and operates by the rules of NDN architecture. VIPs exits the network at virtual caching locations of this data content or the source node. The VIP counts naturally form a downward gradient from the requesting nodes to the source and caching locations. Based on the design of the virtual plane, we can apply optimized caching and forwarding algorithms in the virtual plane to explore the optimal caching locations for different data content. We provided the throughput optimal

VIP algorithm and a stable-caching VIP algorithm in [7], which achieves excellent performance in terms of low delay, high cache hits, and low cache replacements comparing with the most commonly

96 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS used LRU and LFU caching strategies and also state-of-the-art caching strategies.

4.2 Data Analysis: LHC network and CMS Workflow

At the beginning of the SANDIE project, the first question we faced is exploring and understanding how the LHC network operates, the configuration of the LHC networks, which jobs are done in such networks, and the characteristics of the workflow and datasets, etc. These questions are crucial for our deployment of the NDN architecture and design of the VIP framework because they determine how much improvement in terms of network performance can we expect from the

NDN architecture and whether our current design of the VIP framework is suitable for this LHC application. The sources of the knowledge we achieved about the LHC network include LHC documents, literature, presentations, and experiences described by researchers and engineers of the LHC network from Caltech, UCSD, and CERN. We first introduce the existing LHC network architecture, CMS datasets, and CMS workflows. Specifically, we introduce the multi-tiered structure of the CMS network in the US, how the CMS experimental data are organized into specific data formats, how they are distributed among US CMS sites, and how analytical tasks are performed in the US CMS network. After that, we will look into several CERN systems that keep the work logs of real-time statistics on CMS datasets and workflows. We perform our analysis on characterizing the datasets and workflows in the CMS network and discuss our observations in detail.

4.2.1 US CMS Network and Workflow

The US CMS Network: The global LHC network is highly heterogeneous because of the unique requirements for data-intensive and computation-intensive HEP traffic. The US LHC

97 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS network’s core structure is shared by different LHC experiments, such as CMS and ATLAS, and the Tier-2 and Tier-3 sites may differ due to different interests of individual research groups. We call the part of the network that is utilized by CMS experiments the US CMS network. In the US

CMS network, the Tier-1 site for CMS experiments is located at Fermilab, and there are more than

170 sites in total. The US CMS network is supported by multiple network operators, including

Internet2, ESnet, SENSE, CENIC, Amlight, numerous regional networks, and campus networks.

All the existing CMS datasets are available at CERN, which is the only Tier-0 site globally. The

US CMS Tier-1 site Fermilab has most of the datasets which are frequently requested by research tasks. In both Tier-0 and Tier-1 sites, datasets are stored in different storage devices, with “hot” datasets stored in disks that support higher read-out speed and other datasets stored in magnetic tapes to guarantee availability. At all the Tier-2 sites and some of the Tier-3 sites, large repositories are equipped to store active datasets which accelerate the analytical workflows. Each active dataset has several copies around the US CMS network, where the sites are selected following a roughly uniform distribution. Besides, the datasets in repositories are continually changing in round-robin manners.

The connections in the core of the LHC network, including the connections to Tier-0, Tier-1, and

Tier-2 sites, typically support a 100 Gbps link rate. The edge connections to the Tier-3 sites are typically 10 Gbps.

CMS Workflows: Jobs can be submitted by users from CMS Software components

(CMSSW) on CMS sites. Analytical jobs submitted through CMS portals are mostly routed to a location where the data needed is available, and such jobs that are done locally are called on-site jobs. Due to the limitation on storage space and computational resources, some jobs are routed to locations where the data needed for the analysis is not available but with abundant computation

98 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.1: On-Site Jobs and Off-Site Jobs in CMS Network resources. In this case, the corresponding data required for the analysis needs to be transmitted to the computation site, which is called off-site jobs. An illustration of on-site jobs and off-site jobs is shown in Figure 4.1, where the green job on the path above is an on-site job, and the red job on the path below is an off-site job. Some sites are equipped with cache devices, which enable a certain level of optimization in terms of caching. Based on our observation, many Tier-2 sites achieve a

> 95% CPU busy time, which shows the computation-intensive nature of the analysis work in the

CMS network. We also observed that the off-site jobs take up a considerable fraction of total jobs.

Because of off-site jobs, a massive amount of data needs to be moved around the CMS network.

After the jobs are finished, results are cached locally and can be read remotely.

Based on the discussions, we expect that an optimized distributed caching solution will greatly improve the existing CMS systems because:

The current data placement strategy, including the placement of replicas of datasets in repositories • and the current caching strategies, is very intuitive. If we have an algorithm that can measure

the demand level based on locality and solve the problem of joint optimization of storage and

computation, the system’s efficiency and performance will be improved.

The current job allocation method is very dependent on how the datasets are distributed. Conse- •

99 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

quently, some sites are very CPU-intensive, and a huge amount of off-site traffic is generated to

the offloading sites. Given the current data placement strategy, we still have space to improve the

off-site jobs’ performance by optimizing caching and forwarding strategies.

CMS Data Granularity: The raw data generated from the experiments are not directly formatted into datasets; otherwise, the data volume will be too large for the current system to handle.

“Data scouting” [71] is proposed by CMS physicists and students at Caltech and CERN to reduce the sizes of processed data. The idea is first to apply simple algorithms to keep valuable events in the raw data stream. After the pre-processing, data are stored as datasets based on a particular perspective of experiments. Datasets are the largest data unit in the CMS system, ranging from several TBs to hundreds of TBs. Within a dataset, there are datablocks, which are usually several

GBs to hundreds of GBs. Within a datablock, there can be typically a handful to tens of files, where each file is hundreds of MBs to several GBs. A file contains a considerable number of physics events, which is the typical unit for analysis. When submitting an analytical job in the system, it specifies the byte-ranges of a file the task needs. As one may notice, the granularity of CMS data covers an extensive range, from byte-level granularity to datasets of hundreds of TBs.

The choice of data granularity is very crucial for the SANDIE project. On the one hand, using large data units has apparent disadvantages. First, if the session is interrupted, the amount of additional resources needed for retransmissions is enormous. Second, the data efficiency is low because an analytical task only typically requests for several GBs, which is likely to be roughly

1% of a dataset or less. Last, the local caching management strategy will be very inflexible due to the large volume of a dataset. On the other hand, small data units mean a huge namespace, which dramatically increases network status complexity. The lookup cost for routing, forwarding, caching,

100 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS and many other fundamental network functionality will also increase.

Specifically, the selection of content granularity dramatically impacts the performance of the VIP framework. The VIP counts, which are utilized to measure the real demand level based on locality, need to follow certain data granularity. The choice of VIP granularity needs to reflect the demand patterns of requests. Therefore we cannot use an overly large data unit. For example, if only a few files in a dataset are frequently requested, dataset-level VIPs cannot precisely reflect this fact. On the other hand, choosing a small data unit, such as the byte-range level, the number of states for the VIP framework to keep will explode. The choice will incur enormous operation costs from updating the VIP counts, averaging VIP statistics, and sending control information of local VIP information to neighbors. Besides, the VIP counts will be difficult to build up when using small data units, which will influence the performance of the VIP framework. The choice of data granularity should greatly depend on the nature of the CMS workflow. We need to investigate how much data is typically involved in one analysis task and how are the consequential analytical tasks relate to each other. Intuitively, we would like to cache the highly related data at nearby locations to provide smooth service and reduce delay.

We propose first investigating the statistics of CMS datasets and workflows to determine the appropriate granularity for the NDN architecture and VIP framework operations.

4.2.2 Analysis on CMS Data and Workflow

In order to precisely understand the typical patterns of CMS datasets and CMS workflows in the system, we studied metadata, work logs, and statistics from several data systems in CERN, including PhEDEx[72], DAS[73], ElasticSearch[74], and CERN HDFS. Among them, PhEDEx and

DAS systems contain information on the hierarchy structure of datasets, including dataset-level to

101 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS the file-level information, the sizes of data collections, the residing sites of them, and other static information about data collections. ElasticSearch system contains datablock-level statistics on CMS workflows for several years, including detailed information on the sites where a specific job is executed, the data volume for the job, the datablock it belongs to, when does it start, whether the job is onsite or off-site, whether a job finishes successfully or not, the CPU hours needed to finish specific percentiles of the job, etc. Because the finest granularity in the ElasticSearch system is datablock-level, we looked into CERN HDFS for the file-level statistics on CMS workflows in JSON format, where each log contains the datablock name and file name of the job processed. We did cross- analysis between workflow statistics during the same time intervals from ElasticSearch and HDFS systems to understand the request patterns of files within a datablock. In the analysis, we concentrate on CMS workflows performed on CMS US Tier-1 and Tier-2 sites, including Fermilab, MIT, Purdue,

Caltech, UCSD, Vanderbilt, University of Wisconsin Madison, University of Nebraska–Lincoln, and the University of Florida. We analyzed workflow statistics at these US sites from ElasticSearch and

CERN HDFS for three months (January, February, and March) in 2018 and two months (February and March) in 2019 and looked up the corresponding information of involved datablocks and files in

PhEDEx and DAS. We will discuss the critical observations and data patterns on the CMS datasets and workflows in the following section.

Observations on CMS Datasets and Workflows: From the statistics for the five months, we extract the names of the active datablocks in off-site jobs, filter out the user-generated data, and look up their sizes in the PhEDEx and DAS system. Then we have the following observations on datablocks:

The distribution of the sizes of the most popular 500 datablock during the three months in 2018 •

102 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

among US sites is shown in Figure 4.2.

Figure 4.2: Distribution of Datablock Size in CMS Workflow of US sites

From Figure 4.2, we can see that most of the active datablocks are below 100 GBs. Apart from the

size distribution, we also observe that the sizes of AOD format datablocks are relatively large, and

most of the active datablocks are miniAOD format datablocks with sizes smaller than 100GB. We

can foresee that when nanoAOD is widely used, the distribution will have more density on the

small end. The patterns for the two months in 2019 are very similar.

Instead of all the US sites, we pick Caltech site and study the distribution of the most popular 500 • datablocks during the three months in 2018, as shown in Figure 4.3.

103 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.3: Distribution of Datablock Size in CMS Workflow of Caltech Site

We observe that more requests are for datablocks that are smaller than 10 GBs, comparing with

the previous distribution for all US sites. This distribution depends on the way CMS network

allocation jobs, as we mentioned in previous sections, that jobs are likely to be assigned to sites

where the data is available. The distribution also depends on the groups’ specific research interests

around the Caltech area, but the overall distribution is still similar to Figure 4.2. We also observe

similar distributions at other US Tier-2 sites.

Besides the distribution of datablock sizes, we also care about the popularity distribution of the datablocks. The local popularity distribution determines how many requests can be satisfied by caching the most popular datablocks at a site. Therefore, we performed an analysis of popularity distribution on different US CMS sites and achieved the following observations.

In the analysis of popularity, we consider all the US sites and check the number of requests for • datablocks in February and March 2019. Roughly 12500 datablocks are requested in analysis

jobs. By counting the number of requests for each datablock, we sort them in descending order

and plot the percentage of counts covered by the most popular datablocks. We show the result in

104 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.4.

Figure 4.4: Percentage of Requests Covered by Popular Datablocks: US Sites

The observation is that with the most popular 700 datablocks, approximately 80% of the requests

are covered. In other words, about 80% of requests are for roughly 5.6% of active datablocks,

showing that the popularity distribution is quite lightly-tailed.

Similarly, we pick Caltech site as an example because it is the US site that handles the most jobs. • During the same period of time, roughly 6000 datablocks are requested. With the same metrics,

we show the result in Figure 4.5.

105 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.5: Percentage of Requests Covered by Popular Datablocks: Caltech Site

The observation is that with the most popular 400 datablocks, approximately 80% of the requests

are covered. In other words, about 80% of requests are for roughly 7% of active datablocks,

verifying the results we achieved for all US sites.

We did the same analysis on all the US Tier-2 sites, where the results are shown in the following • chart in Figure 4.6.

Figure 4.6: Request Counts: Total vs. Popular Datablocks at US Tier-2 Sites

106 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

In Figure 4.6, the second column shows the total number of requests at the US Tier-2 site. The

third column shows the number of requests for the most popular 500 datablocks at this site. From

the last column, we can see that the most popular 500 datablocks cover 70% to 93% of the total

requests at these sites.

We approximated the popularity distribution of datablocks with Zipf distribution. We also evaluated • the average number of jobs arrived at the sites per second, which we denote it with λ. The

parameters and overall arrival rates are shown in the following chart in Figure 4.7.

Figure 4.7: Zipf Distribution Approximations for Popularity Distributions at US Tier-2 Sites

We can use these distributions to simulate the request process at these sites.

One fact to be noticed is that the counts here are based on the number of requests, where each request is only for several GBs of a datablock. Given the distribution of datablock sizes, we know that each request is possible for a small part of a datablock. As a result, if one user submits consecutive jobs for byte-ranges in the same datablock, the counts are based on the number of individual jobs this user submitted. Although the results reveal some characteristics of the popularity

107 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS distribution, we still do not know the exact request patterns. Therefore, we refined it with file-level statistics from the CERN HDFS system and studied the dependency of requests for different files in the same datablock.

We achieved file-level statistics from the CERN HDFS system. This set of statistics is not as • detailed as the one we get from ElasticSearch, and a separate group designs the data collecting and

logging mechanism for their specific analysis. It turned out that there is some mismatch between

the block-level statistics from ElasticSearch and file-level statistics from CERN HDFS. However,

the observations we get from the HDFS system can be used to infer the file-level request patterns.

Figure 4.8: File-level Popularity of an Example Datablock

We picked a typical datablock /store/data/Run2017C/SingleElectron/MINIAOD/31Mar2018-v1/00000

as an example. This datablock has 269 files in total, and we count the number of requests for each

file from the statistics from HDFS. As one can see in Figure 4.8, the number of requests for each

file is relatively evenly distributed, compared with a typical Zipf distribution. We observed similar

distribution for a few other datablocks which are randomly picked, and due to their similarity, we

will not show all the details. An inference from the plot is that jobs requesting files in a datablock

have potential correlations among them.

108 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Besides the popularity distribution on different sites, we compared the popularity ranks of the datablocks in 2018 and 2019. An interesting observation is that among the top 100 popular datablocks, most of them are the same with small changes in the rank order. Therefore, we infer that the popularity distribution is relatively stable over a long period. We also compared the popularity ranks of datablocks at different sites, and we observed some different patterns. This is mostly due to the current CMS job allocation mechanism, which highly depends on the data placement in the CMS network.

The data and workflow statistical analysis provide us with valuable information to deter- mine configurations of our VIP framework in the SANDIE project. Specifically, the results have a significant impact on the granularity of VIP states, the caching sizes, and the potential designs

VIP need to modify for the SANDIE project. We select the datablock-level as the granularity for making caching and forwarding decisions in the virtual plane of the VIP framework. The reason is that 1) The requests for files in the same datablock are highly correlated, so it is preferable to cache them in the same location or close locations, 2) based on our study on datablock level statistics from ElasticSearch, we estimate that we can use a 10 TB cache to cover 80% of all the requests at one site for active datablocks by only caching datablocks with sizes smaller than 50GBs, and 3) the granularity problem is fundamentally a trade-off between the caching flexibility and management complexity. With a small unit, such as files, a TeraByte level cache can roughly hold hundreds of them while keeping a namespace of the size of 104 to 105 names. With a massive unit like dataset, a TeraByte level cache can only hold a handful of datasets while keeping tens of states in the namespace. With datablock level granularity, the cache can hold roughly several tens to a few hundreds of datablocks, with the namespace to be roughly several hundred, which we believe is

109 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS the optimal choice to provide sufficient flexibility for caching algorithms to achieve benefits and meanwhile, to keep the states in a controllable and cost-efficient scale.

Eventually, we decide to have the VIP states on datablock-level. Please note that the datablock-level VIP states only mean that our control strategies will depend on datablock-level information. The granularity in the actual plane is still following CMS rules, where byte-ranges are used to specific requests. This will not cause practical problems because a request will be divided into interest packets for chunks in the implementation, and corresponding data packets will each contains one chunk of the requested data. The VIP algorithms will make caching and forwarding decisions based on which datablock an interest packet, or data packet belongs.

4.3 Experimental Evaluations

Because the SANDIE project involves a large-scale continental testbed with an enormous volume of data and workflow, we hugely depend on local simulations to adjust and verify our strategies at the beginning of this project. In this section, we first introduce a local simulation of the VIP framework in the US CMS network at the very beginning of the SANDIE project as a justification that the VIP framework has the potential to improve the network performance of the

CMS network significantly. Based on the data analysis we performed, we updated our simulation with more realistic request patterns and network topology. We present the results of the updated simulation in the second half of this section.

110 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.9: The Early Internet2 Topology

4.3.1 Simulations of the VIP Framework in the Internet2 Topology

In order to assess the potential performance gains of the VIP caching and forwarding framework in the CMS application, we carried out experimental simulations of the VIP framework on the US CMS network topology on a local machine. Because Internet2[75] is one of the main components, including ESnet, CENIC, SENSE, etc., of the US LHC network, we consider the

Internet2 part of the whole LHC network in our simulations, as shown in Figure 4.9. The Internet2

LHC network is shared by both the CMS experiments and ATLAS experiments[76]. Because the core sites of these two most important experiments in LHC network are different, we extract the

CMS network from Internet2 LHC network and add several key European sites on the way to CERN, which is the only Tier-0 site, as shown in Figure 4.10. We used approximated parameters for the simulation study, where the key parameters are chosen as follow. Core link capacities are set at 100

111 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.10: Internet2 CMS Topology for Simulation Study

Gbps. Link capacities from exchange points to edge campus nodes are generally 10 Gbps, with some links connecting Tier 2 sites being 100 Gbps. The Interest Packet size is set to 125 KBs, while the

Data Packet size (chunk size) is set to 20 GBs uniformly. The content object size (dataset size) is

2 TBytes, equivalent to 100 Data Packets (chunks). The VIP framework is used to optimize the forwarding of Interest Packets and the caching of Data Packets. VIP states are maintained for the 200 most popular active datasets (out of a total of about 1500 active datasets), which account for roughly

80% of all requests. The data popularity distribution was approximated with a Zipf distribution with a parameter of 1.2. Data requests (each consisting of 100 Interest Packets) arrive at each Tier 1, Tier

2, or Tier 3 node according to a Poisson process with the rate λ(requests per second) where each arriving request is for one data object of the 200 objects. This means that we only utilize caching space for the most popular content. Each data file has two source nodes: FNL and CERN. In this simulation, we consider placing 60 TBs caches at the CHIC, LOSA, ATLA, and AMST nodes in the

112 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.11: Delay Performance of VIP Framework in Internet2 Topology

Internet2 core network(core caches), as well as large PB-level repositories at all Tier 2 sites, which store the most popular datasets subject to storage capacity. Four different scenarios are considered:

1) no caching at any site and no repository at any Tier 2 site, 2) caching only at the four core nodes and no repository at any Tier 2 site, 3) no caching at any site but repositories at all Tier 2 nodes, and

4) both caching at four core nodes and repositories at all Tier 2 nodes. We gradually change the overall arrival rates of requests to see the delay performance under VIP algorithms. Each result is achieved by average over several experiments, where each experiment lasts for 107 seconds in the simulated scenarios.

Figure 4.11 shows the delay performance. From the figure we can see that the average delay for case (2) is not far from the delay for case (3), particularly for moderate arrival rates. Thus, the core caches very efficiently utilize storage resources in decreasing the average request delay under the VIP forwarding and caching algorithm. If there are no core caches, we can achieve a similar

113 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS performance with huge repositories installed. Whichever scenario it is, caching can significantly improve the delay performance under our settings.

As we moved forward in the SANDIE project, we realized that installing huge caches at core nodes are not implementable due to practical reasons. The repositories at Tier-2 sites are already installed by now, and the data placement policies in the repositories have been discussed in the data analysis section. However, there is still a considerable amount of off-site workflow with the presence of the vast repositories. Therefore, we updated our simulation work based on our data analysis results to have a more precise estimation of the performance gains achieved by the VIP framework, as discussed in the following section.

4.3.2 Simulations of the VIP Framework with Off-site CMS Workflow

Based on our observations on CMS workflow, one of the significant sources for data transmissions in the CMS network is the off-site jobs. Therefore, we design a more specific experiment to evaluate the performance improvements achieved from caching datablocks for off-site

CMS jobs. Based on the results of this experiment, we will try to finalize the caching configurations in the testbed, modifications on the VIP framework, and data granularity for the VIP framework in the CMS application.

First, we try to measure the realistic bandwidth of the connections between all the US Tier-1 and Tier-2 sites. With the help from Colorado State University, we can achieve the average connection speed among US Tier-1 and Tier-2 sites using Perfsona. Example results for the connections from

MIT Tier-2 Site are shown in Figure 4.12.

114 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Figure 4.12: Average Connection Speed from MIT CMS Site

We consider the bandwidth as the average throughput of these links, round the values to integers with the unit of Gbps, and use them in the simulations. The underlying connections between the CMS sites are very complicated and difficult to trace. We use a complete graph for the simulation topology, where every pair of CMS sites is directly connected by a link with the rounded bandwidth. In practice, the instantaneous link rates may fluctuate due to background traffic, but the

VIP forwarding strategies are adaptive to changing bandwidth.

We adopt the results from our data analysis in the previous section to choose our parameters for request processes and data sizes. We focus on off-site jobs where substantial data migrations occur. In this simulation, requests are generated on datablock-level, i.e., each request is for a whole datablock. We consider 500 datablocks in total, corresponding to the most popular 500 datablocks in all US sites in February and March 2019. We use the real datablock sizes in our simulations for the 500 datablocks. We consider the cache sizes at each site to be 6 TBs each and consider that each site has a repository that serves as the source node of some of the datablocks. The datablock distribution among the repositories is randomly chosen among the sites, and each datablock has two replicas in the network. At each site, the request processes are simulated as Poisson processes, with the popularity distribution to follow Zipf. All the parameters are from Figure 4.7. We choose the smallest time unit in the simulation to be 1 ms, and the scheduling interval for VIP information

115 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS exchange is 100 ms. Each simulation lasts for the duration of 10000 seconds.

Because we are dealing with heterogeneous data object sizes, we consider the caching problem in the virtual plane of the VIP framework as a Knapsack problem. We compare the delay achieved by the VIP framework against the case without cache. This simulation no longer changes the average arrival rates as in the previous simulation, so we only have one output. The result is shown in Figure 4.13. We can see that the average delay achieved by VIP algorithms is less than half of the case without cache.

Figure 4.13: Delay Performance of Off-site Workflow in CMS Network

4.4 Modification of VIP Algorithms for SANDIE

Based on the results from both of the experiments in Section 4.3, we can expect the deployment of NDN architecture and VIP framework to significantly improve the system performance of the CMS network. However, some designs of the VIP framework cannot directly be applied to the CMS applications, including: 1) the VIP algorithms are designed with the assumption that the data objects are of the same size. However, the size distribution of the CMS datablocks is highly diverse. This changes the nature of the caching algorithms, where the optimal solution changes

116 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS from a greedy algorithm to a dynamic programming algorithm. 2) Jobs are expressed in terms of byte-ranges, which is part of a datablock. Therefore, how to use datablock-level VIP states to precisely measure user demands is a problem. Another problem with byte-range requests is, when trying to retrieve the particular byte-range of a datablock, the site may not have the whole datablock.

3) In the CMS network, the overall average arrival rate of jobs at each site is relatively low, compared with a popular Video-on-Demand system or web browsing system. The difference is that the data volume involved in each CMS job can be very high, and the corresponding computational tasks are

CPU consuming. Therefore, managing VIP states and choosing corresponding VIP parameters will significantly influence the system performance.

In this section, we will discuss the modifications specially designed for the CMS experiment use case. The fundamental difficulties facing the VIP framework in the CMS applications are from the facts that: 1) the granularity in the virtual plane is datablock-level, but the actual jobs are for specific byte-ranges, and 2) the datablock sizes are highly diverse. To address these challenges, we carefully redesign VIP caching and forwarding algorithms for the SANDIE project.

4.4.1 Modified Virtual Interest Packet

In the original VIP framework in [7], a VIP is created along with an actual request. The physical meaning of VIPs is the user demand level without interest suppression. An essential assumption in the original framework is that all the object sizes are the same, and each request is for a whole data object. Therefore, the integer values of VIP is meaningful in the original context.

However, it no longer characterizes the demand level for CMS applications because 1) data objects have different sizes, and 2) a request does not request for an entire data object. For the reasons above, we need to rethink the physical meaning of VIPs and what we are trying to optimize in the virtual

117 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS plane of the VIP framework. The fundamental insights of the VIP framework are to optimize the number of bits that the virtual plane can handle, then use the optimal flow information in the virtual plane to determine caching and forwarding decisions for actual packets. In this sense, the volume of

VIPs created along with an actual request should be proportional to the volume of data requested by the actual request, which can be directly calculated based on the byte-ranges specified in the requests.

Under this setting, the draining speed for cache and transmission rates on links in the virtual plane for each scheduling interval is naturally chosen as the physical read-out speed of disks and physical rates of links. Based on the new VIP meaning, we modify the caching strategy accordingly.

4.4.2 Modifications on VIP Caching Strategy

In the VIP framework, the caching strategies in the virtual plane and the actual plane are designed separately. The caching and forwarding strategies in the virtual plane explore the potential optimal caching locations for data objects, and actual plane uses the virtual caching and flow patterns to strategically make caching and forwarding decisions on actual Interest Packets and Data Packets.

In the virtual plane, the original problem is a simple greedy knapsack problem where the data objects with the highest VIP counts are cached. The forwarding strategy in the virtual plane is to use the entire bandwidth to transmit the data object with the largest positive VIP difference from a node to its neighbor. In the actual plane, a node makes caching decisions based on cache scores from the virtual plane, which is defined as the average number of VIPs of a data object received at this node. A data object with a higher cache score replaces the object with the lowest cache score when there is not enough cache space. The forwarding decision in the actual plane depends on the average VIP flow rates. When receiving an Interest Packet that needs to be sent out, the node compares the average

VIP flow rate on the outgoing links and choose the one with the highest rate.

118 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

With the modified VIPs, the caching problem in the virtual plane is entirely different.

The problem with different data object sizes becomes maximization of the total VIP counts of the cached data objects, subject to the cache space constraint. This becomes a 0-1 Knapsack problem, a classic dynamic programming problem. Although we know the optimal solution to this problem, the complexity of this solution is much higher than the original problem. In the actual plane, the problem is more complicated because a request from the CMS network only requests for part of a datablock.

We need to guarantee the fulfillment of all the requests and improve the system performance of delay, cache hits, and cache evictions. To solve this problem, we propose three different strategies with different concentrations.

Datablock-level Requests: In the first design, we mandate that a request should be for an entire • datablock. This requirement is based on the fact that the number of requests for files in the

same datablock is quite evenly distributed. We also acquired information from experts at CERN,

verifying that analytical jobs have certain dependencies, where the files from the same datablock

are likely to be involved in the consequential analysis.

In this case, we still define the cache score to be the average amount of VIPs received per

datablock. The caching algorithm in the virtual plane is a 0-1 knapsack problem, as we discussed.

The cache replacement policy in the actual plane becomes another knapsack problem, which

chooses the set of data objects, including the new data object and the currently cached objects, to

maximize the total cache score subject to cache space constraint, due to the heterogeneous object

sizes. The optimal solutions in the actual plane is also a dynamic programming algorithm, which

is complicated in a real-time algorithm. To simplify the problem, we loosen the constraint to allow

evicting part of a data object. In this case, the problem can be solved by a greedy algorithm, as

119 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

described in the following. The locally cached incomplete datablocks are the first to be evicted

when needed. We rank the currently cached objects and the new data object in ascending order

of cache score, and we only consider the objects where the cache score is smaller than the new

data object. We keep evicting the data objects with the lowest score in the cache until the total

data volume is less or equal to the cache space. Partial caching will not cause a problem in our

implementation because the actual requests for the datablock will be expressed as a series of

interest packets, each requesting for one small chunk ( 7KB) of the datablock. Therefore, if a ∼ request arrives at a node where only part of the datablock is available, the node will fulfill the

interest packets where the corresponding chunks are available and send out the other interest

packets towards the source nodes. The forwarding strategy is the same as before.

Byte-range Requests: In the second design, we abide by the fact that each request specifics an • arbitrary byte-range. This design allows for the highest flexibility in terms of request granularity,

but as a trade-off, the information provided by the VIP counts are not as precise as the first design.

However, we will argue that the performance will not be impacted if the actual dependency of

requests on the whole datablock is high.

We first describe the virtual caching policy. When an actual request arrives in the

actual plane, the amount of corresponding VIPs is chosen as the number of bits it requests. This

information can be directly calculated by the byte-range specified in the object name. In the virtual

plane, the VIP algorithms still operate on the granularity of datablocks. However, we simplify

the caching strategy by allowing partial caching. Therefore, the optimization problem becomes

a greedy algorithm of maximizing the total VIP counts of data objects in the virtual cache. The

solution is to first rank the data objects in descending order of a new cache score, which is defined

120 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

as the VIP counts divided by the datablock size. The physical meaning of the new cache score

is the level of demand that we can satisfy with 1 bit in the cache. Then because we allow partial

caching, the optimal strategy is to populate the virtual cache with the objects with the highest

cache scores until it is full. Forwarding strategy in the virtual plane is the same as before.

In the actual plane, the data availability cannot be guaranteed at a caching location

due to the fact that a request is no longer is for a whole datablock. However, we argue that all

the requests will be fulfilled, and the caching patterns will eventually converge to a desirable

result. First, when a request for a particular byte-range arrives at a node where only part of the

byte-range is available, the chunks in the cache will be sent back in data packets, and the other

interest packets will be passed to upstream neighbors towards the source node following FIB

information until they are fulfilled. Consequently, all requests will be fulfilled. Second, if actual

requests for different byte-ranges in the same datablock are highly dependent, what is going to

happen is that the cache will be gradually filled with the entire datablock. The reason is that the

requests of the same datablock share the same forwarding path according to the block-level VIP

forwarding strategies, and the cache score is based on datablocks, so caching decisions for the

chunks of the same datablock share the same cache score. Thirdly, if the popularity distribution of

different byte-ranges in the same datablock is very uneven, we can design an LRU type strategy

for the chunks of the same datablock. As we know, LRU is not an optimal caching strategy, but

it achieves adequate performance and has very low computation complexity. That is why LRU

has been the most widely used caching strategy. For these reasons, we design our VIP caching

strategy in the actual plane as follows. When receiving data packets, the cache scores are checked.

The data packets with a higher cache score will replace the ones with the lowest scores in the

121 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

cache. Meanwhile, we organize the cache in an object-based FIFO queue structure. Every time a

cache hit happens, the data will be moved to the head of the queue of this data object. When a

new data chunk is cached, it is placed at the head of the queue of this data object. When cache

eviction happens, the last chunk in the FIFO queue of the datablock with the lowest cache score

will be the first to be evicted. We show the cache structure in Figure 4.14. Despite the complex

structure it may sound, the cache structure can be implemented in a straightforward way, where all

the pointers can be implemented with hash functions so that the lookup complexity will be O(1).

Figure 4.14: Modified VIP Caching Structure

Byte-range Requests with an Alternative Cache Score: The third design is almost the same as • the second design, where the only difference is that we consider the cache score to be the VIP

counts of the datablock. The motivation for this setting is by considering the scenario where the

popularity distribution of the different byte-ranges in the same datablock can be extremely biased.

In this case, the cache score in the second design, which divides the VIP counts by the block size,

will not precisely reflect the actual range of data that is being frequently requested.

The notable differences in the designs are how to define cache scores and how to design the caching strategies in the actual plane. Because of the different physical meanings of the cache

122 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS scores in different designs, the performance of them highly depends on the actual request patterns in the practical CMS system, so we need extensive experiments or operations to select the optimal one.

Besides the significant modifications in caching and forwarding concepts, we also modified details of the VIP framework during our tests and implementation. We will talk about them in the next section on implementations of VIP algorithms in the SANDIE project.

4.5 SANDIE Testbed and Implementations

In this section, we first introduce our deployment of the SANDIE testbeds, including a local testbed at Northeastern University and a continental WAN SANDIE testbed. Then we describe our implementations of the VIP framework in two crucial NDN forwarding software, namely the

NDN Forwarding Daemon and the NDN-DPDK. Along with the implementations, we discuss the corresponding test results. We also introduce a milestone demonstration of the SANDIE project in

SuperComputing 19’ and discuss our results during the demonstration. At last, we will provide an overview of the latest progress of the SANDIE project and some future expectations.

4.5.1 SANDIE Testbed Configuration

In order to test the performance of our implementation of the VIP framework, we configured a local testbed at Northeastern University and a continental testbed in the US.

We have four machines at Northeastern University for preliminary experiments on VIP caching algorithms implementations in the local testbed. These machines each have an 18-core/36- thread CPU built on Intel’s Skylake architecture, 128 GB of RAM, and NVMe SSDs with 2 TB of storage space. Two of the machines are equipped with 10 GbE Mellanox NICs. All machines are

123 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS scalable in terms of their RAM and storage capacities.

For the WAN SANDIE testbed, we deploy a high-performance testbed with Layer-2

VLAN techniques with the help of engineers from SENSE, Internet2, ESnet, SCnet, multiple regional networks, and multiple campus network administrators. Compared with a Layer-3 protocol, VLAN techniques have advantages in terms of management and performance. Consequently, we decided to deploy VLANs in the data plane for the SANDIE WAN testbed. VLANs are set up across multiple campus networks, multiple regional networks, Internet2, SCinet, ESnet, CENIC, and other networks.

We have three sites in total: NEU, Caltech, and CSU. We had an extra site (i.e., SC19 booth) at the Denver site during SC19. At NEU, the VLAN is configured from the NEU SANDIE server to the NEU DMZ network, then to NOX, the regional network, before entering the Internet2 Albany interface. At CSU, the VLAN is configured from CSU SANDIE servers, through the CSU DMZ network, the regional networks including FRGP, WRN, PNWGP, to the Internet2 Chicago interface.

At Caltech, the VLAN is configured from Caltech SANDIE servers to the Caltech campus network, through CENIC, ESnet, and finally to the Internet2 Sunnyvale interface. The VLANs with different tag numbers are then translated within Internet2. We deploy the following VLAN paths in the testbed:

1) NEU (VLAN 3698) to Caltech (VLAN 3610), which supports 10Gbps data rate, 2) NEU (VLAN ∼ 3700) to CSU (VLAN 3549), which supports 10Gbps data rate, 3) Caltech (VLAN 3611) to CSU ∼ (VLAN 3551), which supports 10Gbps data rate, 4) NEU (VLAN 3699) to SC19 site, which ∼ supports 10Gbps data rate, 5) Caltech (VLAN 3950) to SC19 site, which supports 100Gbps data ∼ ∼ rate, and 6) CSU (VLAN 3550) to SC19 site, which supports 10Gbps data rate. The core switching ∼ is mainly configured in Internet2 and SCinet. The VLAN configurations are shown in Figure 4.15.

We have successfully deployed NFD and NDN-DPDK docker containers and will explore how we

124 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS can integrate these with Kubernetes for easy deployment at scale.

Figure 4.15: VLAN Configurations of SANDIE Testbed

4.5.2 Implementation of VIP Framework in Existing NDN Forwarders

Our VIP framework is developed on top of two implementations of NDN forwarding software. One is the NDN Forwarding Daemon (NFD) [77], which is developed and maintained mainly by the NDN community, and the other is NDN-DPDK [78] which is developed and maintained by NIST. NFD has been the software the NDN community widely used in research and system designs, with NDN functions fully deployed. However, NFD is not explicitly designed for the purpose of data-intensive applications, so the system performance in terms of throughput, caching space, and processing abilities are not optimized. The NDN-DPDK is a recently developed NDN forwarder design that addresses performance, which supports parallel processing, cache management, and transmissions. Therefore, we developed the VIP framework in NFD in the first stage as an experimental implementation to show that our modified VIP framework improves system performance compared with the existing NFD caching algorithms. In the second stage, we migrate our design onto NDN-DPDK and optimize the performance of the VIP framework based on the design of

125 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

NDN-DPDK.

Implementation of the VIP Framework in NFD: As the first stage of our implementa- tion, we finish implementing and optimizing the VIP algorithms on top of the NDN Forwarding

Daemon (NFD) and deploy it on our local testbed. The implementation of the VIP algorithms includes three main parts:

1. At each node, a VIP table serves as the core of the VIP framework. With a hash table structure,

the keys are the datablock names. For each entry, the values include local statistics such as

the VIP count and the average VIPs received within a sliding window. In addition to the local

VIP information, each node also keeps VIP counts of each neighboring node in the hash table,

where the identifier for a neighbor is denoted by the identifier for the outgoing link to that

neighbor. The average number of VIPs transmitted over this link of a particular datablock is

also recorded for each link. All the VIP caching and forwarding strategies in the actual plane

are determined based on the information in these tables.

2. We designed the VIP control message structure and the exchanging mechanism. We designed

two types of control messages, one carrying local VIP counts and the other one carrying the

number of VIPs to be transmitted to the neighbors at every scheduling interval. The control

messages are pulled periodically by the specific neighbors following the NDN data fetching

rules. The information contained in the control messages is then used to updates all the

corresponding fields in the VIP tables.

3. We implemented actual VIP caching and forwarding strategies, following our discussions in

Section 4.4.2.

126 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

After the coding phase, we tested the NFD VIP algorithms on the local testbed at North- eastern University with four machines connected in a chain topology (consumer-forwarder-forwarder- producer) and further optimized the implementation during the tests. We optimized VIP algorithms with a few practical modifications on the algorithm details and parameters based on our observations.

In the actual CMS network, the request rates are not that high, and each is for a large data collection. • As a result, the updates of the VIP states are on a slower time-scale. Therefore, we use a large

scheduling interval, which is 2 sec in our case.

Meanwhile, we observe that the size of the sliding window needs to be very large when there • is a great number of data objects. As a result, more space is needed for these statistics, and a

higher cost is incurred for updating the averaged VIP counts. To address this problem, we use

an exponential average window, where only three values need to be kept for each data object,

including an average value, the timestamp that it was last updated, and a decaying parameter

serving as the exponent. Another benefit of the exponential average window is that it only needs

to be updated when there is a request for this data object, which greatly reduce the updating cost.

Although we do not consider interest aggregation in the virtual plane, the Interest Packets in the • actual plane still strictly follow NDN rules. In a chain topology, if a data object is cached locally,

interest packets for this data object will not be sent out, however large the local VIP count is. With

the original design, we observed that the most popular content is cached at multiple locations in a

chain topology, meaning that the copy at a more upstream location will never have a cache hit for

the aforementioned reason. Therefore, we modified that if the data object is cached locally, we do

not send out the VIPs of this data object to its neighbors. After the modification, we observed a

127 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

perfect caching pattern, where the more popular a data object is, the closer it is cached towards the

consumer node where all the requests are generated.

We observed that for some reason, the names of CMS datablocks are not used as the pre- • fix of the files inside the datablock. For example, in the dataset “/SingleElectron/Run2017C-

31Mar2018-v1/MINIAOD”, there are datablocks like “/SingleElectron/Run2017C-31Mar2018-

v1/MINIAOD#1bd9acc8-3756-11e8-91bf-ac1f6b05ea26” and “/SingleElectron/Run2017C-31Mar2018-

v1/MINIAOD#0fbf918e-38c6-11e8-91bf-ac1f6b05ea26” which use the name of the dataset it be-

longs as the prefix for the datablock names. However, the file names are like “/store/data/Run2017C

/SingleElectron/MINIAOD/31Mar2018-v1/00000/00E1F15D-E937-E811-A986-44A842CF0600.root”,

where the prefix “/store/data/Run2017C /SingleElectron/MINIAOD/31Mar2018-v1/00000” is not

the datablock name. Besides, this prefix is shared by files from more than one datablocks. In order

to tell which datablock a byte-range request belongs to, we implemented a matching table for the

data-block level VIP operations in the virtual plane. The size of the table is not large, which has

roughly 104 entries, so it is manageable in our system.

We tested the performance of VIP algorithms in NFD on the local testbed. One machine at one end acts as a producer, which holds 20 1GB data files that can be requested. Two machines in the middle serve as pure forwarders, where each can cache 3 data files in the RAM space. The other end machine acts as a consumer, which continuously sends out requests for the 20 data files at random following a Zipf(1) distribution. The consumer can also cache three files locally. By closely observing the real-time statistics on the machines, we observed the VIP statistics evolved as designed, and we observed expected caching patterns. The performance of the VIP algorithms on NFD is also very stable during a continuous experiment run, which lasted for three days.

128 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

Implementation of the VIP Framework in NDN-DPDK: We observed that system performance was limited by the throughput achievable by NFD. To meet the high requirements in system performance in the SANDIE project, we migrated VIP caching and forwarding algorithms to a high-performance NDN-DPDK forwarder developed by NIST based on NDN protocols. In a recent local test conducted by NIST in 2020, the forwarder achieved throughput up to 106 Gbps with multiple forwarding threads on a local testbed. The implementation of the VIP framework is more complicated in NDN-DPDK due to the fact that it is delicately designed with multi-threaded processing, caching, and transmission features enabled. Besides the functions we implemented in

NFD for the VIP framework, we optimize the caching and forwarding strategies in the actual plane, as we discussed in Section 4.4.2.

We tested the implemented VIP algorithms in NDN-DPDK on the local testbed. We connected three machines in a chain structure with 10 Gbps and 1 Gbps connections, respectively

(consumer-forwarder-producer), and allocated 5GB RAM caching space at the forwarder. The consumer placed requests at random times for one of 30 1GB files following a Zipf distribution with parameter 1. We observed an expected caching pattern at the forwarder, where the forwarder always caches the files with the highest cache scores. We also tested the maximum available amount of

RAM caching space on the local testbed. During the test, we can at most allocate 37 GB of caching space at the forwarder with 100 GB RAM space in total. We demonstrated our current results in a demonstration at the site of SuperComputing 19’, which will be introduced in the next section.

4.5.3 Demonstration at SuperComputing 19’

During SuperComputing 19’, the SANDIE team demonstrated our achievements by show- ing the throughput and caching performance with the NDN-DPDK forwarder. The demonstration

129 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS was done using high-performance consumer and producer applications over a transcontinental layer-2 testbed, as well as NDN-DPDK forwarders with VIP algorithms implemented. The data files used for the demonstration were obtained from CMS open data portal [79], with sizes chosen around

2.5 GBs and 1 GB. The tests for throughput and caching were performed in parallel on two paths

(Caltech-SC19 booth and CSU-NEU) on the WAN testbed. In the throughput demonstration, Caltech team was able to show that an average throughput of 6.7Gbps was stably achieved by a single thread transmission from Caltech to our booth at the conference site at Denver. By the time of SuperCom- puting 19’, NIST was able to achieve a 60 Gbps with six threads on a local testbed with NDN-DPDK forwarders. As a comparison, this shows that our result on a continental testbed is exceptional. A caching demonstration was carried out on a linear path on the SANDIE WAN testbed, which includes

2 CSU machines and 1 NEU machine connected through VLAN with 10Gbps network cards. In the test, 2 CSU machines acted as a consumer and a forwarder, respectively, and the NEU machine acted as a producer, which was distant from the consumer and forwarder. With the consumer making requests for ten datablocks residing at the producer, we showed the cached contents at the forwarder and the caching hit performance. With a cache hit, download time was decreased by a factor of 10, and the equivalent throughput was increased by the same factor.

4.5.4 An Overview of the Latest Progress on the SANDIE Project

To provide a complete view of the latest progress on the SANDIE project, we introduce the other components of the ongoing projects from other groups.

NDN naming schemes for the CMS data is designed and developed by the CSU group, following • their similar work on the climate application [80].

130 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS

An XrootD plugin is developed by the Caltech group, which serves as the interface for accessing • CMS data with XrootD system calls from the NDN system. An NDN Consumer/Producer

application is also developed by the Caltech group, where the consumer application generates a

series of NDN interest packets based on byte-range requests for certain CMS data and the producer

application packetizes corresponding data in the NDN packet format.

For the convenience of deploying software, configurations, and management, Docker containers • are deployed at all the sites on the SANDIE testbed by Caltech, Northeastern, and CSU.

By the time we formulate this thesis, the single-threaded version VIP forwarder in NDN-DPDK has • been successfully implemented. Members of the Northeastern group is working on the following

aspects of the NDN-DPDK forwarder: 1) multi-threaded caching and forwarding strategies,

2)hybrid caching with multi-tier storage, 3)optimizing VIP framework for specific network cards,

and 4)migration from local testbed to WAN SANDIE testbed.

Since a new data format of NanoAOD is being developed and applied in CMS data, we look into • the event format and see the potential to optimize VIP caching specifically for this new format.

For the purpose of research and experiments, we would like to take advantage of this scientific • application to explore the potential capabilities of the VIP framework to handle proactive caching

and joint optimization for computing.

4.6 Summary

The SANDIE project started in 2015 with some preliminary analysis and simulations. A significant amount of efforts have been put into this project from groups and researchers from all

131 CHAPTER 4. SANDIE: SDN-ASSISTED NDN FOR DATA-INTENSIVE EXPERIMENTS over the world to solve the challenges faced in the large-scale data-intensive scientific applications, specifically the LHC program on HEP experiments. On our part, we have been working on designing, optimizing, and implementing the VIP framework in NDN architecture for the SANDIE project. We performed extensive data analysis on the statistics collected from CMS datasets and workflows to derive comprehensive knowledge about characteristics of the CMS applications, especially on the aspects of networking, caching, computation, and data management. Several simulation experiments were carried out to verify the potential performance improvement we expected from the deployment of NDN architecture and the VIP framework for the CMS systems. Modifications on the caching and forwarding strategies in the VIP framework were carefully designed to accommodate the unique characteristics of the CMS workflows. We made concrete progress in the implementation of the VIP framework in NFD and NDN-DPDK forwarders, with their performance verified in extensive local tests and a demonstration at SuperComputing 19’ on a WAN SANDIE testbed. With more features planing to be implemented, including parallel processing, proactive caching, designs for nanoAOD data format, deployment of docker containers, multi-tier storage structures, the SANDIE project has a promising future. We believe that eventually, the products from the SANDIE project will be a valuable prototype of applying NDN architecture for data-intensive scientific applications and even for general content distribution networks.

132 Bibliography

[1] G. Forecast, “Cisco visual networking index: Global mobile data traffic forecast update 2017–

2022,” Update, vol. 2017, p. 2022, 2019.

[2] T. Barnett, S. Jain, U. Andra, and T. Khurana, “Cisco visual networking index (vni), complete

forecast update, 2017–2022,” Americas/EMEAR Cisco Knowledge Network (CKN) Presentation,

2018.

[3] Cisco, “Cisco annual internet report (2018–2023),” White Paper, Mar. 2020.

[4] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing:

The communication perspective,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4,

pp. 2322–2358, 2017.

[5] Cisco, “Cisco 2020 networking trends report,” 2020.

[6] R. Kohavi and R. Longbotham, “Online experiments: Lessons learned,” Computer, vol. 40,

no. 9, pp. 103–105, Sept 2007.

[7] E. Yeh, T. Ho, Y. Cui, M. Burd, R. Liu, and D. Leong, “Vip: A framework for joint dynamic

forwarding and caching in named data networks,” in Proceedings of the 1st ACM Conference

133 BIBLIOGRAPHY

on Information-Centric Networking, ser. ACM-ICN ’14. New York, NY, USA: ACM, 2014,

pp. 117–126. [Online]. Available: http://doi.acm.org/10.1145/2660129.2660151

[8] S. Ioannidis and E. Yeh, “Adaptive caching networks with optimality guarantees,”

IEEE/ACM Trans. Netw., vol. 26, no. 2, pp. 737–750, Apr. 2018. [Online]. Available:

https://doi.org/10.1109/TNET.2018.2793581

[9] M. Mahdian and E. Yeh, “Mindelay: Low-latency forwarding and caching algorithms for

information-centric networks,” arXiv:1710.05130[cs.NI], 2017.

[10] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of proactive caching in

5g wireless networks,” IEEE Communications Magazine, vol. 52, no. 8, pp. 82–89, Aug 2014.

[11] E. Bas¸tug,˘ M. Bennis, E. Zeydan, M. A. Kader, I. A. Karatepe, A. S. Er, and M. Debbah, “Big

data meets telcos: A proactive caching perspective,” Journal of Communications and Networks,

vol. 17, no. 6, pp. 549–557, 2015.

[12] J. Famaey, T. Wauters, and F. De Turck, “On the merits of popularity prediction in multi-

media content caching,” in 12th IFIP/IEEE International Symposium on Integrated Network

Management (IM 2011) and Workshops, 2011, pp. 17–24.

[13] H. Pinto, J. M. Almeida, and M. A. Gonc¸alves, “Using early view patterns to predict the

popularity of youtube videos,” in Proceedings of the Sixth ACM International Conference

on Web Search and Data Mining, ser. WSDM ’13. New York, NY, USA: ACM, 2013, pp.

365–374. [Online]. Available: http://doi.acm.org/10.1145/2433396.2433443

[14] M. Ahmed, S. Spagna, F. Huici, and S. Niccolini, “A peek into the future:

Predicting the evolution of popularity in user generated content,” in Proceedings

134 BIBLIOGRAPHY

of the Sixth ACM International Conference on Web Search and Data Mining, ser.

WSDM ’13. New York, NY, USA: ACM, 2013, pp. 607–616. [Online]. Available:

http://doi.acm.org/10.1145/2433396.2433473

[15] H. Li, X. Ma, F. Wang, J. Liu, and K. Xu, “On popularity prediction of videos shared in online

social networks,” in Proceedings of the 22nd ACM International Conference on Information;

Knowledge Management, ser. CIKM ’13. New York, NY, USA: Association for Computing

Machinery, 2013, p. 169–178. [Online]. Available: https://doi.org/10.1145/2505515.2505523

[16] Y. Zhang, X. Tan, and W. Li, “Ppc: Popularity prediction caching in icn,” IEEE Communications

Letters, vol. 22, no. 1, pp. 5–8, 2018.

[17] V. A. Siris, X. Vasilakos, and G. C. Polyzos, “Efficient proactive caching for supporting

seamless mobility,” in Proceeding of IEEE International Symposium on a World of Wireless,

Mobile and Multimedia Networks 2014, 2014, pp. 1–6.

[18] N. Abani, T. Braun, and M. Gerla, “Proactive caching with mobility prediction under uncertainty

in information-centric networks,” in Proceedings of the 4th ACM Conference on Information-

Centric Networking, ser. ICN ’17. New York, NY, USA: Association for Computing

Machinery, 2017, p. 88–97. [Online]. Available: https://doi.org/10.1145/3125719.3125728

[19] R. Lan, W. Wang, A. Huang, and H. Shan, “Device-to-device offloading with proactive caching

in mobile cellular networks,” in 2015 IEEE Global Communications Conference (GLOBECOM),

2015, pp. 1–6.

[20] D. Grewe, M. Wagner, and H. Frey, “Perceive: Proactive caching in icn-based vanets,” in 2016

IEEE Vehicular Networking Conference (VNC), 2016, pp. 1–8.

135 BIBLIOGRAPHY

[21] E. Bas¸tug,˘ M. Bennis, and M. Debbah, “Social and spatial proactive caching for mobile data

offloading,” in 2014 IEEE International Conference on Communications Workshops (ICC),

2014, pp. 581–586.

[22] S. Muller,¨ O. Atan, M. van der Schaar, and A. Klein, “Context-aware proactive content

caching with service differentiation in wireless networks,” IEEE Transactions on Wireless

Communications, vol. 16, no. 2, pp. 1024–1036, 2017.

[23] S. Zhou, J. Gong, Z. Zhou, W. Chen, and Z. Niu, “Greendelivery: proactive content caching

and push with energy-harvesting-based small cells,” IEEE Communications Magazine, vol. 53,

no. 4, pp. 142–149, 2015.

[24] C. Yi, S. Huang, and J. Cai, “An incentive mechanism integrating joint power, channel and link

management for social-aware d2d content sharing and proactive caching,” IEEE Transactions

on Mobile Computing, vol. 17, no. 4, pp. 789–802, 2018.

[25] X. Xu, Y. Zeng, Y. L. Guan, and R. Zhang, “Overcoming endurance issue: Uav-enabled

communications with proactive caching,” IEEE Journal on Selected Areas in Communications,

vol. 36, no. 6, pp. 1231–1244, 2018.

[26] L. Ale, N. Zhang, H. Wu, D. Chen, and T. Han, “Online proactive caching in mobile edge

computing using bidirectional deep recurrent neural network,” IEEE Internet of Things Journal,

vol. 6, no. 3, pp. 5520–5530, 2019.

[27] Z. Zhang, Y. Yang, M. Hua, C. Li, Y. Huang, and L. Yang, “Proactive caching for vehicular

multi-view 3d video streaming via deep reinforcement learning,” IEEE Transactions on Wireless

Communications, vol. 18, no. 5, pp. 2693–2706, 2019.

136 BIBLIOGRAPHY

[28] P. Jing, Y. Su, L. Nie, X. Bai, J. Liu, and M. Wang, “Low-rank multi-view embedding learning

for micro-video popularity prediction,” IEEE Transactions on Knowledge and Data Engineering,

vol. 30, no. 8, pp. 1519–1532, 2018.

[29] W. Liu, J. Zhang, Z. Liang, L. Peng, and J. Cai, “Content popularity prediction and caching for

icn: A deep learning approach with sdn,” IEEE Access, vol. 6, pp. 5075–5089, 2018.

[30] M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for proactive caching

in cloud-based radio access networks with mobile users,” IEEE Transactions on Wireless

Communications, vol. 16, no. 6, pp. 3520–3535, 2017.

[31] L. Hou, L. Lei, K. Zheng, and X. Wang, “A q -learning-based proactive caching strategy for

non-safety related services in vehicular networks,” IEEE Internet of Things Journal, vol. 6,

no. 3, pp. 4512–4520, 2019.

[32] T. Hou, G. Feng, S. Qin, and W. Jiang, “Proactive content caching by exploiting transfer

learning for mobile edge computing,” International Journal of Communication Systems, vol. 31,

no. 11, p. e3706, 2018.

[33] S. O. Somuyiwa, A. Gyorgy,¨ and D. Gund¨ uz,¨ “A reinforcement-learning approach to proactive

caching in wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 36,

no. 6, pp. 1331–1344, 2018.

[34] D. Tennenhouse, “Proactive computing,” Commun. ACM, vol. 43, no. 5, p. 43–50, May 2000.

[Online]. Available: https://doi.org/10.1145/332833.332837

[35] Y. Engel and O. Etzion, “Towards proactive event-driven computing,” in Proceedings of the 5th

ACM international conference on Distributed event-based system, 2011, pp. 125–136.

137 BIBLIOGRAPHY

[36] M. S. Elbamby, C. Perfecto, M. Bennis, and K. Doppler, “Edge computing meets millimeter-

wave enabled vr: Paving the way to cutting the cord,” in 2018 IEEE Wireless Communications

and Networking Conference (WCNC), 2018, pp. 1–6.

[37] A. Bousdekis, N. Papageorgiou, B. Magoutas, D. Apostolou, and G. Mentzas, “Enabling

condition-based maintenance decisions with proactive event-driven computing,” Computers in

Industry, vol. 100, pp. 173–183, 2018.

[38] Z. Zhou and F. Han, “Service proactive caching based computation offloading for mobile edge

computing,” in 2019 11th International Conference on Wireless Communications and Signal

Processing (WCSP), 2019, pp. 1–6.

[39] Z. Cui, J. Kancherla, K. W. Chang, N. Elmqvist, and H. Corrada Bravo, “Proactive visual and

statistical analysis of genomic data in Epiviz,” Bioinformatics, vol. 36, no. 7, pp. 2195–2201,

11 2019. [Online]. Available: https://doi.org/10.1093/bioinformatics/btz883

[40] B. E. Geerhart and V. R. Dasari, “Adaptive computation at the tactical edge using a proactive

resource allocator for reduced latencies,” in Disruptive Technologies in Information Sciences

IV, vol. 11419. International Society for Optics and Photonics, 2020, p. 1141909.

[41] Z. Hu, Z. Zheng, T. Wang, L. Song, and X. Li, “Game theoretic approaches for wireless

proactive caching,” IEEE Communications Magazine, vol. 54, no. 8, pp. 37–43, 2016.

[42] Z. Zheng, L. Song, Z. Han, G. Y. Li, and H. V. Poor, “A stackelberg game approach to proactive

caching in large-scale mobile edge networks,” IEEE Transactions on Wireless Communications,

vol. 17, no. 8, pp. 5198–5211, 2018.

138 BIBLIOGRAPHY

[43] W. Rao, L. Chen, A. W.-C. Fu, and Y. Bu, “Optimal proactive caching in peer-to-peer

network: Analysis and application,” in Proceedings of the Sixteenth ACM Conference

on Conference on Information and Knowledge Management, ser. CIKM ’07. New York,

NY, USA: Association for Computing Machinery, 2007, p. 663–672. [Online]. Available:

https://doi.org/10.1145/1321440.1321533

[44] I. H. Hou, N. Z. Naghsh, S. Paul, Y. C. Hu, and A. Eryilmaz, “Predictive scheduling for virtual

reality,” in IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, 2020, pp.

1349–1358.

[45] M. S. Elbamby, M. Bennis, and W. Saad, “Proactive edge computing in latency-constrained fog

networks,” in 2017 European Conference on Networks and Communications (EuCNC), 2017,

pp. 1–6.

[46] J. Tadrous, A. Eryilmaz, and H. El Gamal, “Proactive resource allocation: Harnessing the

diversity and multicast gains,” IEEE Transactions on Information Theory, vol. 59, no. 8, pp.

4833–4854, 2013.

[47] L. S. Muppirisetty, J. Tadrous, A. Eryilmaz, and H. Wymeersch, “On proactive caching with de-

mand and channel uncertainties,” in 2015 53rd Annual Allerton Conference on Communication,

Control, and Computing (Allerton), Sept 2015, pp. 1174–1181.

[48] J. Tadrous and A. Eryilmaz, “On optimal proactive caching for mobile networks with demand

uncertainties,” IEEE/ACM Transactions on Networking, vol. 24, no. 5, pp. 2715–2727, October

2016.

139 BIBLIOGRAPHY

[49] F. Alotaibi, S. Hosny, J. Tadrous, H. E. Gamal, and A. Eryilmaz, “Towards a marketplace for

mobile content: Dynamic pricing and proactive caching,” arXiv:1511.07573[cs.GT], 2015.

[50] L. Huang, S. Zhang, M. Chen, X. Liu, L. Huang, S. Zhang, M. Chen, and X. Liu, “When

backpressure meets predictive scheduling,” IEEE/ACM Trans. Netw., vol. 24, no. 4, pp.

2237–2250, Aug. 2016. [Online]. Available: https://doi.org/10.1109/TNET.2015.2460749

[51] S. Zhang, L. Huang, M. Chen, and X. Liu, “Proactive serving decreases user delay

exponentially: The light-tailed service time case,” IEEE/ACM Trans. Netw., vol. 25, no. 2, pp.

708–723, Apr. 2017. [Online]. Available: https://doi.org/10.1109/TNET.2016.2607840

[52] K. Chen and L. Huang, “Timely-throughput optimal scheduling with prediction,” IEEE/ACM

Transactions on Networking, 2018.

[53] M. Andrews, “Probabilistic end-to-end delay bounds for earliest deadline first scheduling,” in

INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communica-

tions Societies. Proceedings. IEEE, vol. 2. IEEE, 2000, pp. 603–612.

[54] V. Sivaraman and F. Chiussi, “Providing end-to-end statistical delay guarantees with earliest

deadline first scheduling and per-hop traffic shaping,” in Proceedings IEEE INFOCOM 2000.

Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE

Computer and Communications Societies. IEEE, 2000, pp. 631–640.

[55] M. Kargahi and A. Movaghar, “A method for performance analysis of earliest-deadline-first

scheduling policy,” The Journal of Supercomputing, vol. 37, no. 2, pp. 197–222, 2006.

140 BIBLIOGRAPHY

[56] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation and cross-layer control in

wireless networks,” Foundations and Trends in Networking, vol. 1, no. 1, pp. 1–144, 2006.

[Online]. Available: http://dx.doi.org/10.1561/1300000001

[57] R. W. Wolff, Stochastic modeling and the theory of queues. Pearson College Division, 1989.

[58] S. P. Meyn and R. L. Tweedie, “Markov chains and stochastic stability,” 1993.

[59] R. G. Gallager, Discrete stochastic processes. Springer Science & Business Media, 2012, vol.

321.

[60] L. Rossi and O. Bruning,¨ “High luminosity large hadron collider: A description for the european

strategy preparatory group,” Tech. Rep., 2012.

[61] N. Kaiser, H. Aussel, B. E. Burke, H. Boesgaard, K. Chambers, M. R. Chun, J. N. Heasley,

K.-W. Hodapp, B. Hunt, R. Jedicke et al., “Pan-starrs: a large synoptic survey telescope array,”

in Survey and Other Telescope Technologies and Discoveries, vol. 4836. International Society

for Optics and Photonics, 2002, pp. 154–164.

[62] D. Bernholdt, S. Bharathi, D. Brown, K. Chanchio, M. Chen, A. Chervenak, L. Cinquini,

B. Drach, I. Foster, P. Fox et al., “The earth system grid: Supporting the next generation of

climate modeling research,” Proceedings of the IEEE, vol. 93, no. 3, pp. 485–495, 2005.

[63] I. V. Grigoriev, H. Nordberg, I. Shabalov, A. Aerts, M. Cantor, D. Goodstein, A. Kuo, S. Mi-

novitsky, R. Nikitin, R. A. Ohm et al., “The genome portal of the department of energy joint

genome institute,” Nucleic acids research, vol. 40, no. D1, pp. D26–D32, 2011.

141 BIBLIOGRAPHY

[64] “Earth BioGenome Project Aims to Sequence DNA From All Complex Life,” Apr 2018,

[Online; accessed 20. Jan. 2020]. [Online]. Available: https://www.ucdavis.edu/news/

earth-biogenome-project-aims-sequence-dna-all-complex-life

[65] C. Collaboration, S. Chatrchyan, G. Hmayakyan, V. Khachatryan, A. Sirunyan, W. Adam,

T. Bauer, T. Bergauer, H. Bergauer, M. Dragicevic et al., “The cms experiment at the cern lhc,”

2008.

[66] J. Shiers, “The worldwide lhc computing grid (worldwide lcg),” Computer physics communica-

tions, vol. 177, no. 1-2, pp. 219–223, 2007.

[67] “Cms xrootd architecture.” [Online]. Available: https://twiki.cern.ch/twiki/bin/view/Main/

CmsXrootdArchitecture

[68] G. Apollinari, O. Bruning,¨ T. Nakamoto, and L. Rossi, “High luminosity large hadron collider

hl-lhc,” arXiv preprint arXiv:1705.08830, 2017.

[69] L. Zhang, D. Estrin, J. Burke, V. Jacobson, J. Thornton, D. K. Smetters, B. Zhang, G. Tsudik,

D. Massey, C. Papadopoulos, T. Abdelzaher, L. Wang, P. Crowley, E. Yeh, k. claffy, and

D. Krioukov, ”Named data networking (NDN) project,”, 2010.

[70] E. Yeh, T. Ho, Y. Cui, M. Burd, R. Liu, and D. Leong, “Vip: A framework for joint dynamic

forwarding and caching in named data networks,” in Proceedings of the 1st ACM Conference

on Information-Centric Networking. ACM, 2014, pp. 117–126.

[71] C. Collaboration, “Data parking and data scouting at the cms experiment,” CMS Detector

Performance Summary CMS-DP-2012-022, vol. 10, 2012.

142 BIBLIOGRAPHY

[72] R. Egeland, T. Wildish, and S. Metson, “Data transfer infrastructure for cms data taking,” in XII

Advanced Computing and Analysis Techniques in Physics Research, vol. 70. SISSA Medialab,

2009, p. 033.

[73] “DAS system,” https://cmsweb.cern.ch/das/.

[74] “CERN elasticsearch,” https://es-cms.cern.ch/kibana.

[75] R. Summerhill, “The new internet2 network,” in 6th GLIF Meeting, 2006.

[76] G. Aad, J. Butterworth, J. Thion, U. Bratzler, P. Ratoff, R. Nickerson, J. Seixas, I. Grabowska-

Bold, F. Meisel, S. Lokwitz et al., “The atlas experiment at the cern large hadron collider,” Jinst,

vol. 3, p. S08003, 2008.

[77] “Nfd - named data networking forwarding daemon,” [Accessed 2020-1-7]. [Online]. Available:

https://named-data.net/doc/NFD/current/

[78] J. Shi, “Ndn-dpdk: High-speed named data networking forwarder,” ”National Institute

of Standards and Technology”, 2020, [Accessed 2020-1-7]. [Online]. Available:

https://doi.org/10.18434/M32111

[79] “Cern open data portal.” [Online]. Available: http://opendata.cern.ch/

[80] S. Shannigrahi, C. Papadopoulos, E. Yeh, H. Newman, A. J. Barczyk, R. Liu, A. Sim, A. Mughal,

I. Monga, J.-R. Vlimant et al., “Named data networking in climate research and hep applications,”

in Journal of Physics: Conference Series, vol. 664, no. 5. IOP Publishing, 2015, p. 052033.

[81] D. P. Bertsekas, R. G. Gallager, and P. Humblet, Data networks. Prentice-Hall International

New Jersey, 1992, vol. 2.

143 Appendix A

Appendix of Chapter 2

A.1 Proof of Proposition 1

I(t) P I(t) P + Ui(ti) P i=1 Ui(ti) i∈Z :Ri=1,i≤I(t) i=1 Ui(ti) First we consider the terms I(t) and A(t) . I(t) is the P + Ui(ti) average of terms in U (t ): i I (t) . i∈Z :Ri=1,i≤I(t) is the average of the samples in { i i ≤ } A(t) U (t ): i I (t) ,R = 1 , which are selected from U (t ): i I (t) if R = 1. One important { i i ≤ i } { i i ≤ } i fact is that Ui (ti) is independent of Ri, because the server has no knowledge of Ri before ti. Because

Ri’s are IID, we have:

PI(t) P U (t ) i∈Z+:R =1,i≤I(t) Ui (ti) lim i=1 i i = lim i , w.p.1 (A.1) t→∞ I (t) t→∞ A (t)

Recall that U = U (t ) if R = 1, and U U (t ) if R = 0. So we have: i i i i i ≥ i i i

PI(t) U PI(t) U (t ) lim i=1 i lim i=1 i i (A.2) t→∞ I (t) ≥ t→∞ I (t) P P ∈Z+ ≤ Ui ∈Z+ ≤ Ui (ti) lim i :Ri=1,i I(t) = lim i :Ri=1,i I(t) (A.3) t→∞ A (t) t→∞ A (t)

144 APPENDIX A. APPENDIX OF CHAPTER 2

By combining the equations above, we have:

PI(t) P U i∈Z+:R =1,i≤I(t) Ui lim i=1 i lim i , w.p.1 (A.4) t→∞ I (t) ≥ t→∞ A (t)

Therefore by definitions of U and U , we have U U , w.p.1. A ≥ A

A.2 Proof of Theorem 1

First we make the following definitions similar to (2.9), (2.10) and (2.11):

The amount of time that ΨR works in idle state (namely Reactive Idle) from 0 to t is:

T (t) τ (0, t]: V (τ) = 0 (A.5) RI , | { ∈ } |

The amount of time that ΨR works in busy state (namely Reactive Busy) from 0 to t is:

T (t) τ (0, t]: V (τ) > 0 (A.6) RB , | { ∈ } |

The limiting fraction of time that ΨR works in idle state is:

TRI (t) αRI , lim (A.7) t→∞ t

The limiting fraction of time that ΨR works in busy state is:

TRB (t) αRB , lim (A.8) t→∞ t

Compare reactive scheme with proactive scheme under the same sample path. Define the system state at time t as ”Proactive Served”, if ΨR works in busy state at time t, and ΨP works in proactive state at time t. The amount of time that ΨP works in ”Proactive Served” state is:

T (t) τ (0, t]: V (τ) = 0,V (τ) > 0 (A.9) PS , | { ∈ P R } |

145 APPENDIX A. APPENDIX OF CHAPTER 2

where VP (t) is the unfinished work in proactive scheme at t and VR (t) is the unfinished work in reactive scheme at t. The corresponding time intervals are marked in Figure 2.3.

Observe the system at time t when V (t) = 0 in both reactive scheme and proactive scheme. All the potential requests in i : t < t , the corresponding realizations R : t < t and { i } { i i } the resulting U (t): t < t of strategy Ψ have been determined, so the entire timeline from 0 to { i i } P t can be divided in two states in both reactive scheme and proactive scheme, as shown in Figure 2.3.

Consequently, we have:

TRB (t) + TRI (t) = TPR (t) + TPP (t) = t (A.10)

An important fact to be noticed here is that:

TPP (t) = TPS (t) + TRI (t) (A.11)

Then by (A.9):

X µTPS (t) = Ui (t) (A.12) + i∈Z :Ri=1,i≤I(t) P where the term + ( ) is the total amount of proactive work received by all the i∈Z :Ri=1,i≤I(t) Ui t actual requests arrived in (0, t).

Next, the total proactive work done by time t equals µTPP (t) by the definition of TPP (t), which satisfies the follow equation:

I(t) ∞ X X µTPP (t) = Ui (t) + Ui (t) (A.13) i=1 i=I(t)+1 where the term PI(t) U (t) is the total proactive work done for requests in i Z+ : i I (t) , and i=1 i { ∈ ≤ } P∞ U (t) is the total proactive work done for requests in i Z+ : i > I (t) . i=I(t)+1 i { ∈ }

146 APPENDIX A. APPENDIX OF CHAPTER 2

Next we have: P P + Ui(t) A(t) i∈Z+:R =1,i≤I(t) Ui(t) i∈Z :Ri=1,i≤I(t) i lim →∞ limt→∞ t t = t A(t) PI(t) PI(t) i=1 Ui(t) I(t) i=1 Ui(t) limt→∞ t limt→∞ t I(t) A(t) lim →∞ U = t t A I(t) limt→∞ t U λp , w.p.1 (A.14) ≤ λ

= p, w.p.1 (A.15)

with equality in (A.14) if and only if the strategy ΨP satisfies Property 2 based on Proposition 1.

Following (A.15), we have:

P PI(t) i∈Z+:R =1,i≤I(t) Ui (t) U (t) lim i = p lim i=1 i t→∞ t t→∞ t

, w.p.1 (A.16)

Then based on Equation (A.12) , (A.13), (A.14) and (A.16), we have:   PI(t) ( ) + P∞ ( ) µT (t) (A.13) i=1 Ui t i=I(t)+1 Ui t p lim PP p = lim t→∞ t t→∞ t PI(t) P∞ U (t) i=I(t)+1 Ui (t) = lim i=1 i p + lim p (A.17) t→∞ t t→∞ t P (A.16),(A.14) limt→∞ i∈Z+:R =1,i≤I(t) Ui (t) i , w.p.1 (A.18) ≥ t (A.12) µT (t) = lim PS , w.p.1 (A.19) t→∞ t with equality in (A.18) if and only if the strategy satisfies both Property 1 and Property 2. So if we put Equation (A.11) over t and take t , we have: → ∞ T (t) T (t) T (t) lim PP = lim PS + lim RI t→∞ t t→∞ t t→∞ t T (t) T (t) lim PP p + lim RI , w.p.1 (A.20) ≤ t→∞ t t→∞ t T (t) T (t) (1 p) lim PP lim RI , w.p.1 (A.21) − t→∞ t ≤ t→∞ t

147 APPENDIX A. APPENDIX OF CHAPTER 2 where (A.20) is from (A.19).

By replacing corresponding terms in Equation (A.21) with Equation (A.7) and (2.11), we have:

α α RI , w.p.1 (A.22) PP ≤ 1 p − and we know from fundamental queueing theory that:

λps α = 1 (A.23) RI − µ

Then we have the result:

µ λps α − , w.p.1 (A.24) PP ≤ µ (1 p) −

And it follows that:

λps µp α = 1 α − , w.p.1 (A.25) PR − PP ≥ µ (1 p) − with equality in (A.24) and (A.25) if and only if the strategy satisfies both Property 1 and Property 2.

A.3 Proof of Corollary 1

The average amount of proactive work done for each potential request in i Z+ : i I (t) { ∈ ≤ } by time t can be calculated by dividing the total amount of proactive work done for these requests by

148 APPENDIX A. APPENDIX OF CHAPTER 2 the total number I (t), so

PI(t) U U = lim i=1 i t→∞ I (t) PI(t) U (t) = lim i=1 i t→∞ I (t) ∞ P∞ P U (t) Ui (t) = lim i=1 i lim i=I(t)+1 (A.26) t→∞ I (t) − t→∞ I (t) P∞ i=1 Ui(t) limt→∞ t , w.p.1 (A.27) ≤ I(t) limt→∞ t µα = PP , w.p.1 (A.28) λ µ λps − , w.p.1 (A.29) ≤ λ (1 p) − with equality in (A.27) and (A.29) if and only if the strategy satisfies both Property 1 and Property 2.

We get (A.28) from (A.27) by the definition of αPP and by the Strong Law of Large Numbers. The second term in (A.26) is 0 w.p.1 if and only if ΨP satisfies Property 1. Similarly, we have:

P i∈Z+:R =1,i≤I(t) Si S = lim i t→∞ A (t) P + Si i∈Z :Ri=1,i≤I(t) lim →∞ = t t (A.30) A(t) limt→∞ t µα = PR , w.p.1 (A.31) λp λs µ − , w.p.1 (A.32) ≥ λ (1 p) − with equality in (A.32) if and only if the strategy satisfies both Property 1 and Property 2. (A.31) is by the Strong Law of Large Numbers.

149 APPENDIX A. APPENDIX OF CHAPTER 2

A.4 Proof of Proposition 2

Evolution of the Markov Chain: Consider the system starting from state Xn = xn,

+ + + xn Z at τn. By Definition 3, it means that 1) V (τ ) = 0, 2) U + (τ ) = 0, and 3) ∈ n J(τn ) n + + J (τn ) = P (τn ) + xn. From Condition 3), we know that the system starts proactively serving

+ + + request J (τn ) = P (τn ) + xn right after τn. If the request P (τn ) + xn receives φ bits of proactive service before its arrival epoch t + , or an equivalent condition: P (τn )+xn

  U + t + = φ (A.33) P (τn )+xn P (τn )+xn is satisfied, it can be easily verified by Definition 3 that a transition happens right after the request

+ P (τn ) + xn receives φ bits of proactive service. Therefore if (A.33) is satisfied, we have:

τ < t + (A.34) n+1 P (τn )+xn

By the definition of threshold-based strategies, an important fact is that:

If x > 1 :U (τ +) = φ, i = P (τ +) + 1,P (τ +) + 2,...,P (τ +) + x 1 n i n n n n n −

+ + + Ui(τn ) = 0, i = P (τn ) + xn,P (τn ) + xn + 1,...

+ + + If xn = 1 :Ui(τn ) = 0, i = P (τn ) + 1,P (τn ) + 2,... (A.35)

given Xn = xn. Therefore another equivalent condition to (A.33) is:

  If xn = 1 : φ µ t + τn ≤ P (τn )+xn − + P (τn )+xn−1 X     If x > 1 : (s φ) R V t + + φ µ t + τ (A.36) n i P (τn )+xn P (τn )+xn n + − − ≤ − i=P (τn )+1

+ PP (τn )+xn−1 + (s φ) Ri represents the total amount of reactive work of all actual arrivals in i=P (τn )+1 −     τ , t + . V t + represents the amount of unfinished reactive work at the arrival n P (τn )+xn P (τn )+xn

150 APPENDIX A. APPENDIX OF CHAPTER 2

+ + epoch of request P (τn ) + xn. Notice that P (τn ) + xn is the request being proactively worked

+ PP (τn )+xn−1   on starting from τn. So the term + (s φ) Ri V tP τ + +x represents the total i=P (τn )+1 − − ( n ) n   amount of reactive work done in τ , t + . The RHS means the total amount of work that n P (τn )+xn   can be done in τ , t + . If Condition (A.36) is satisfied, it means that φ bits of proactive n P (τn )+xn

+ + work can be done for request J (τ ) = P (τ ) + x before t + , so (A.33) and (A.34) are n n n P (τn )+xn satisfied. Next, we discuss the evolution of the Markov chain based on (A.36).

Case 1: If (A.36) is satisfied, (A.34) is true. In this case, we have An < xn and the following the transition happens:

X = J τ +  P τ +  n+1 n+1 − n+1

= J τ + + 1 P τ + + A  n − n n

= x + 1 A (A.37) n − n

  Case 2: If (A.36) is not satisfied, we know that no transition happens in τ , t + . n P (τn )+xn

+ Because there have been xn arrivals by t + , we have An xn. We are going to show that P (τn )+xn ≥

Xn+1 = 1 in this case in the following.

Suppose we have X 2, then we have J τ +  = P τ +  + x > P τ +  + 1 n+1 ≥ n+1 n+1 n+1 n+1 by Definition 3. Then based on the definition of threshold-based strategies in Algorithm 1, there must

0   0+ 0+ 0+ an epoch τ t + , τn+1 such that 1) V (τ ) = 0, 2) U 0+ (τ ) = 0, and 3)J (τ ) = ∈ P (τn )+Xn J(τ ) P τ +  + 1. Because we know τ 0 < τ , we have J (τ 0+) = P τ +  + 1 P (τ 0+) + 1. n+1 n+1 n+1 ≥ 0 By Definition 3, a transition should happen at τ which is earlier than τn+1. So a contradiction is achieved. Then if A x : n ≥ n

Xn+1 = 1 (A.38)

151 APPENDIX A. APPENDIX OF CHAPTER 2

By summarizing Cases 1 and 2, we have

X = max X + 1 A , 1 , n = 0, 1,... (A.39) n+1 { n − n } ∀

Proof of Markovian Property: Now we consider (A.36). Condition (A.36) is determined

+ + + +   by Xn, Ri : i > P (τ ) , i Z , ti : i > P (τ ) , i Z and V t + . If the realiza- { n ∈ } { n ∈ } P (τn )+xn tion (i.e., Ri’s), arrival epoch (i.e., ti’s) and the amount of reactive work to be done of each actual

  + + arrival are determined, the term V t + is also deterministic. Ri : i > P (τ ) , i Z P (τn )+xn { n ∈ } are IID Binomial random variables which are memoryless. t : i > P (τ +) , i Z+ are determined { i n ∈ } by the Poisson process P (t); t > 0 , which are also memoryless. The amount of reactive work { } to be done of each actual arrival is determined by Xn and the arrival processes after τn. Therefore

Xn+1 only depends on Xn and what happens after τn, and the chain is Markovian by definition.

A.5 Proof of Proposition 3

Recall the definition of pφ P r A = k X = x , x > k, k = 0, 1, 2,.... In k , { n | n n} n n φ o order to calculate the transition probabilities pk , k = 0, 1, 2,... , we consider the probabilities pφ = P r A = k X = , k = 0, 1, 2,..., based on Fact (2.19). pφ can then be interpreted as k { n | n ∞} k the probability that there are k potential arrivals before the next transition happens given X = . n ∞ Then the target term P∞ pφ (1 k) can be explained as the expected drift of the next transition, k=0 k − i.e., E [X X X = ], in the Markov chain. In the following, we are going to compute the n+1 − n| n ∞ n φ o probabilities of pk , k = 0, 1, 2 ... , with respect to different values of φ.

Distributions of Tn and An: We first analyze the distribution of T X = and n| n ∞ A X = . Inspired by the methods used in the analysis of the distribution of busy periods in n| n ∞ M/G/1 queues in Section 8-4 of [57], we use a similar method.

152 APPENDIX A. APPENDIX OF CHAPTER 2

Define function T (ω1, ω, λ, p) as the length of a time interval starting from the arrival epoch of the first job in an empty system, to the epoch when the system becomes empty for the first time again. The arrivals follow a Poisson process with an overall arrival rate of λ, where each arrival is realized with probability p, IID. The service time of the first job is ω1, and the service time of the next arrivals is ω if realized.

Notice that queueing disciplines will not affect the length of this time interval, as long

φ s−φ as the system is work-conserving. Specifically, if we select ω1 = µ and ω = µ , we have     T φ , s−φ , λ, p = (T X = ). If we select ω = s−φ and ω = s−φ , T s−φ , s−φ , λ, p is the µ µ n| n ∞ 1 µ µ µ µ length of the time interval from the arrival epoch of an actual request to the epoch it gets completely served under Last-In-First-Out (LIFO) discipline. It is also the length of a busy period in our proposed system given X = , which is the time interval from the arrival of the first actual request when n ∞ V (t) = 0, to the epoch when V (t) = 0 again.

Denote the number of potential arrivals when V (t) = 0 during T (ω , ω, λ, p) as N 1 P ∼ (λω ), where ( ) is the Poisson distribution. Notice that N is different from the number P 1 P · p of arrivals in T (ω1, ω, λ, p) because some arrivals happen when the server is working reactively, i.e.V (t) > 0. Denote the number of actual arrivals among these N arrivals as N (N , p), p A ∼ B P where ( , ) is the Binomial distribution. When an actual request among N arrives, a busy period B · · A starts. The length of each busy period follows the distribution T (ω, ω, λ, p), IID.

First we derive T (ω, ω, λ, p). Following similar arguments in Section 8-4 of [57], we consider LIFO queueing discipline for unfinished work V (t), which does not affect the length of the time interval T (ω, ω, λ, p). Then we have:

E [T (ω, ω, λ, p) N ,N ] = ω + N E [T (ω, ω, λ, p)] + (N N ) 0 (A.40) | P A A P − A

153 APPENDIX A. APPENDIX OF CHAPTER 2

Then by definitions of NA and NP , we have:

E [T (ω, ω, λ, p) N ] = ω + pN E [T (ω, ω, λ, p)] (A.41) | P P

E [T (ω, ω, λ, p)] = ω + pλωE [T (ω, ω, λ, p)] (A.42)

So we have:

ω E [T (ω, ω, λ, p)] = (A.43) 1 pλω −

Similarly, we can derive E [T (ω1, ω, λ, p)]. We know the service time of the first job is ω1, and each busy period follows T (ω, ω, λ, p), we have:

E [T (ω1, ω, λ, p)] = ω1 + λpω1E [T (ω, ω, λ, p)] (A.44) ω = ω + λpω (A.45) 1 1 1 pλω − By replacing corresponding terms, we have

 φ s φ  E [T X = ] = E T , − , λ, p n| n ∞ µ µ s−φ φ φ = + λp µ (A.46) µ µ 1 pλ s−φ − µ φ = (A.47) µ pλ (s φ) − − 1 = µ−λps (A.48) φ + pλ Similarly, we can define A (ω , ω, λ, p) as the number of arrivals in the next transition given X = . 1 n ∞ With similar arguments, we have:

λω E [A (ω, ω, λ, p)] = (A.49) 1 λωp − λω E [A (ω , ω, λ, p)] = λpω + λω (A.50) 1 1 1 λωp 1 −  φ s φ  λ E [A X = ] = E A , − , λ, p = (A.51) n| n ∞ µ µ µ−λps φ + pλ

154 APPENDIX A. APPENDIX OF CHAPTER 2

An interesting fact is that if we choose φ according to U ∗, as defined in (2.15), we have   < 1, if φ < U ∗    E [A X = ] ∗ (A.52) n| n ∞ = 1, if φ = U     ∗ > 1, if φ > U

Notice that P r A = k X = = pφ, k = 0, 1,..., and { n | n ∞} k ∀ ∞ X E [A X = ] = pφk (A.53) n| n ∞ k k=0 So we have    ∗ < 1, if φ < U  ∞  X φ  ∗ pk k = 1, if φ = U (A.54) k=0     ∗ > 1, if φ > U And our conclusion directly follows:    ∗ > 0, if φ < U  ∞  X φ  p (1 k) ∗ (A.55) k − = 0, if φ = U k=0     ∗ < 0, if φ > U

A.6 Proof of Proposition 4

In order to prove E [T X = x ] < , x Z+, we first prove that: n| n n ∞ ∀ n ∈

E [T X = 1] E [T X = k] , k > 1 (A.56) n| n ≥ n| n ∀ then prove that:

E [T X = 1] < (A.57) n| n ∞

155 APPENDIX A. APPENDIX OF CHAPTER 2

Figure A.2: Comparison of the Proactive Sys- Figure A.1: Comparison of System 1 and tem and the Virtual System in the Proof of System 2 in the Proof of Proposition 4 Proposition 4 to finish the proof.

Proof of (A.56):

First, we prove the following:

E [T X = 1] E [T X = k] , k > 1 (A.58) n| n ≥ n| n ∀

φ Consider two systems under ΨP which start from τn with P1 (τn) = P2 (τn), but in different states :

1 2 Xn = 1 in the first system and Xn = k, k > 1 in the second system. Based on (A.35), no proactive service has been done for any future requests at τ in the first system, and the first k 1 future n − requests have received φ bits of proactive service by time τn in the second system. Recall that J (t) denotes the request the server would proactively work on if the V (t) = 0 at t. Here we use J1 (t) for the first system and J2 (t) for the second system.

1 2 + Because we assume P1 (τn) = P2 (τn), Xn = 1 and Xn = k, k > 1, we have J2 (τn ) >

+ J1 (τn ) by Definition 3. Then if we consider the same arrival processes after τn in both systems

156 APPENDIX A. APPENDIX OF CHAPTER 2

φ under the same strategy ΨP , we have

J (t) J (t) , t τ (A.59) 2 ≥ 1 ∀ ≥ n

Then we have τ 1 τ 2 by Definition 3, which means a transition always happens in the second n+1 ≥ n+1 system no later than the first system. Therefore we have:

T X = 1 T X = k, k 2 (A.60) n| n ≥ n| n ≥

It is true for every sample path, so we have:

E [T X = 1] E [T X = k] , k > 1 (A.61) n| n ≥ n| n ∀

An example of the comparison can be found in Figure A.1.

Proof of (A.57): Next, we prove that E [T X = 1] < . Again, we use the method of n| n ∞ comparisons to prove it.

We compare the proactive system with a virtual system. Both systems start from state

˜ Xn = 1 (Xn = 1 in the virtual system) at τn. In the virtual system, the server stops proactively

∗ serving any requests from τn. Our goal is to find the earliest epoch τ > τn which satisfies:

∗+ + ∗+ ∗+ + 1)V˜ τ = 0, 2)U˜ + (τ ) = 0, 3)P˜ τ = I˜ τ > I τ (A.62) J˜(τn ) n n

Note that these conditions are very similar to the conditions in Definition 3. We consider τ ∗ as the next transition time in the virtual system. Correspondingly, we define T˜ = τ ∗ τ . Note that n − n ˜ ∗+ ˜ ∗+ + P (τ ) = I (τ ) > I (τn ) is a stronger condition to Condition 3) in Definition 3. Based on the definitions, we are going to prove

  T˜ X˜ = 1 (T X = 1) (A.63) n| n ≥ n| n

157 APPENDIX A. APPENDIX OF CHAPTER 2 in two systems under the same sample path.

φ Now we consider the same sample path in both the proactive system under ΨP and the ˜ virtual system starting from Xn = 1 and Xn = 1 from τn. An example of the comparison is shown in Figure A.2. Since no proactive work will be done in the virtual system before τ ∗, all the actual arrivals need to receive s bits reactively in the virtual system, which is no fewer than the proactive system for each request. Therefore similar to the previous arguments we did for Equation (A.59), we have:

J˜(t) J (t) , t > τ (A.64) ≤ ∀ n

If we compare the conditions in (A.62) with the conditions in Defition 3, we can see that (A.62) is a stronger condition because if P (τ ∗+) = I (τ ∗+), we must have J˜(τ ∗+) I (τ ∗+) + 1 > P (τ ∗+). ≥ Therefore the transition in the proactive system will happen no later than the virtual system. Therefore   the transition time T˜ X˜ = 1 (T X = 1) is true along every sample path. n| n ≥ n| n Construction of τ ∗: Here we aim to find the epoch τ ∗ in the virtual system which satisfies conditions in Equation (A.62).

Our target is to find a busy period which starts with one actual arrival, and no other potential arrivals happen before it ends. The epoch τ ∗ can be found when such a busy period ends because:

∗+ + 1) the server becomes idle so V˜ (τ ) = 0, 2) U˜ + (τ ) = 0 is always true in the virtual system J˜(τn ) n based on its assumption, 3) the latest arrival is an actual arrival so P˜ (τ ∗+) = I˜(τ ∗+). Because there

˜ ∗+ + should be at least one actual arrival after τn, we have I (τ ) > I (τn ) so Condition 3) is satisfied.

Recall that we assume λsp < µ, so the virtual system is stable. The expected idle period length in the virtual system is then

1 E [I ] = (A.65) V λp

158 APPENDIX A. APPENDIX OF CHAPTER 2 where I is defined as the length of an idle period in the virtual system. Based on E[BV ] = ρ = V E[BV ]+E[IV ]

λps µ , where BV is the length of a busy period in the virtual system, we can also calculate the expected length of a busy period in the virtual system E [B ]. So we know that E [I ] < ,E [B ] < . V V ∞ V ∞ The next step is to find such a busy period. Every time a busy period starts with an actual arrival, the probability that there are no potential arrivals during the service time of the actual arrival

s s −λ µ µ is e , IID. Therefore, the expected number of busy periods that such a busy period happens for

s 1 λ µ the first time is E [NB] = −λ s = e , where NB is the number of busy periods when the first busy e µ period satisfying the condition is observed. So the expected time that such a busy period happens is then bounded as:

h i E T˜ X˜ = 1 E [N ](E [I ] + E [B ]) < (A.66) n| n ≤ B V V ∞

Define the bound EV , E [NB](E [IV ] + E [BV ]), which is a deterministic finite number given system parameters λ, p, s, µ. Therefore we have our bound on E [T X ]: n| n h i E [T X ] E [T X = 1] E T˜ X˜ = 1 = E < , X Z+ (A.67) n| n ≤ n| n ≤ n| n V ∞ ∀ n ∈

So we proved E [T X = k] < , k Z+ by combining (A.56) and (A.57). n| n ∞ ∀ ∈ Similarly we can prove E [A X = k] < , k Z+. n| n ∞ ∀ ∈

A.7 Proof of Lemma 1

First assume that by time t, there have been M (t) max m τ t transitions in the , { | m ≤ } φ Markov chain under ΨP . Then we have the following inequalities of M (t):

µt M (t) P (t) + (A.68) ≤ φ t lim EV , w.p.1 (A.69) t→∞ M (t) ≤

159 APPENDIX A. APPENDIX OF CHAPTER 2

where EV , E [NB](E [IV ] + E [BV ]) is the bound in (A.66). Equation (A.68) is by the fact that a transition happens either when the server finishes proactively serving a request with φ bits, or some

t potential arrival happens before it receives φ bits of proactive service. The term limt→∞ M(t) is the limiting average of Tn. Equation (A.69) is from Proposition 4. Take (A.68) over t and take limit of t , we get: → ∞ M (t) P (t) µ µ lim lim + = λ + , w.p.1 (A.70) t→∞ t ≤ t→∞ t φ φ

Combining it with (A.69), we have:

µ M (t) 1 λ + lim (A.71) φ ≥ t→∞ t ≥ EV

On the other hand, recall that if J (t) > P (t), we have (A.35). Based on Definition 3,

X X 1, n = 0, 1,.... Then we have: n+1 − n ≤ ∀

J (t) P (t) max X ,X X + 1 (A.72) − ≤ M(t) M(t)+1 ≤ M(t)

J (t) P (t) min X ,X X 1 (A.73) − ≥ M(t) M(t)+1 ≥ M(t)+1 −

t τ , τ  ∀ ∈ M(t) M(t)+1

Therefore we have t (τ , τ ) : ∀ ∈ n n+1 ∞ X U (t) max φ (J (t) P (t)) , 0 (A.74) i ≤ { − } i=P (t)+1

φ X + 1 (A.75) ≤ M(t) ∞ X U (t) max φ (J (t) P (t) 1) , 0 (A.76) i ≥ { − − } i=P (t)+1

φ X 2 (A.77) ≥ M(t)+1 −

, t τ , τ  ∀ ∈ M(t) M(t)+1

160 APPENDIX A. APPENDIX OF CHAPTER 2

(A.74) is achieved by considering the amount of proactive service done for request J (t) as φ. (A.75) is from (A.72). (A.76) is achieved by considering the amount of proactive service done for request

J (t) as 0, and (A.77) is from (A.73). Therefore for all t, we have:

P∞  Ui (t) φ X + 1 i=P (t)+1 M(t) (A.78) t ≤ t P∞  Ui (t) φ X 2 i=P (t)+1 M(t)+1 − (A.79) t ≥ t

If we take the limit of t we have → ∞ P∞  Ui (t) φ X + 1 lim i=P (t)+1 lim M(t) t→∞ t ≤ t→∞ t  φ X + 1 M (t) = lim M(t) t→∞ M (t) t X  µ lim n φ λ + (A.80) ≤ n→∞ n φ P∞  Ui (t) φ X 2 lim i=P (t)+1 lim M(t)+1 − t→∞ t ≥ t→∞ t  φ X 2 M (t) = lim M(t)+1 − t→∞ M (t) t X φ lim n (A.81) ≥ n→∞ n EV

P∞ U (t) Xn i=P (t)+1 i So if we know limn→∞ n = 0, w.p.1, we have limt→∞ t = 0, w.p.1 from (A.80).

P∞ U (t) i=P (t)+1 i Xn And if limt→∞ t = 0, w.p.1, we have limn→∞ n = 0, w.p.1 from (A.81). So by

φ Definition 1, the threshold-based strategy ΨP satisfies Property 1 if and only if the corresponding

Xn Markov chain satisfies limn→∞ n = 0, w.p.1.

161 APPENDIX A. APPENDIX OF CHAPTER 2

A.8 Proof of Theorem 3

Case 1: If φ < U ∗, we know that the chain is transient from Theorem 2. Therefore,

N > 0 such that: ∃

X > 1, n > N n ∀ with probability 1. From the Nth transition, we look at the drifts, i.e. ∆ X X , m Z+. m , m+1 − m ∀ ∈ Then for all n > N:

n−1 X X = X + (X X ) n N i+1 − i i=N n−1 X = XN + ∆i i=N ∞ X X = XN + ∆i (A.82) + k=0 i∈Z :∆i=1−k,N≤i

+ | i∈Z :∆i=1−k,N≤i

i Z+ : ∆ = 1 k, N i < n lim | { ∈ i − ≤ } | = pφ, w.p.1 (A.84) n→∞ n N k −

162 APPENDIX A. APPENDIX OF CHAPTER 2 based on Strong Law of Large Numbers. If we take both sides over n and take the limit of n , → ∞ we have:

∞ ! X X X i Z+ : ∆ = 1 k, N i < n (n N) lim n = lim N + (1 k) | { ∈ i − ≤ } | − n→∞ n n→∞ n − n N n k=0 − ∞ X  i Z+ : ∆ = 1 k, N i < n (n N) = (1 k) lim | { ∈ i − ≤ } | − − n→∞ n N n k=0 − ∞ X = (1 k) pφ − k k=0

> 0 (A.85) based on Proposition 3. So when φ < U ∗, the threshold-based strategy does not satisfy Property 1 based on Lemma 1.

Case 2: If φ U ∗, we consider a virtual strategy. In this strategy, the server can do ≥ proactive work at the rate of

λ λ µ−λps +λp µ−λps +λp 1 µ φ µ U∗ µ = µ > µ,  (0, 1) (A.86)  , 1  ≥ 1  1  ∈ − − −

In this case, define J (t) as the request that the system would proactively work on at time t under the virtual strategy. Because the system is always working at a strictly higher rate µ > µ with the virtual strategy, we have J (t) J (t) , t if the two systems are under the same sample path. Then  ≥ ∀ by Definition 3, we have:

X X , n Z+ (A.87) n ≥ n ∀ ∈

 where Xn is defined as the states under the virtual strategy.

Following the same steps of Proposition 3, we can derive the new set of transition probabil-

163 APPENDIX A. APPENDIX OF CHAPTER 2

n φo ities pk , and prove that the Markov chain under the virtual strategy is transient. Specifically:

∞ X X lim n = pφ (1 k) = ,  (0, 1) (A.88) n→∞ n k − ∀ ∈ k=0

So  (0, 1), we have ∀ ∈ X X lim n lim n =  (A.89) n→∞ n ≤ n→∞ n

And if we take  0: → X X lim n lim lim n = 0 (A.90) n→∞ n ≤ →0 n→∞ n

Therefore based on Lemma 1, the threshold-based strategy satisfies Property 1 when φ U ∗. ≥ φ Then by summarizing Cases 1 and 2, the threshold-based strategy ΨP satisfies Property 1 if and only if φ U ∗. ≥

A.9 Proof of Theorem 4

In order to prove Theorem 4, we first consider the following Lemma 3. The idea of Lemma

φ 3 is to look at the proactive work done within one transition. Under strategy ΨP , denote the total amount of proactive work done in (τn, τn+1) for all potential requests as ζn, and denote the amount

A of proactive work done in (τn, τn+1) for actual requests as ζn . We investigate the expectation of ζn

A and ζn conditioned on Xn and Xn+1 in Lemma 3.

Lemma 3.

E ζA X = k, X = l = E [ζ X = k, X = l] p, if l > 1 (A.91) n | n n+1 n| n n+1

E ζA X = k, X = l E [ζ X = k, X = l] p, if l = 1 (A.92) n | n n+1 ≤ n| n n+1

E ζA X = k, X = l < E [ζ X = k, X = l] p, if k = 1, l = 1 (A.93) n | n n+1 n| n n+1

164 APPENDIX A. APPENDIX OF CHAPTER 2

Proof. Proof of (A.91): Given the starting state Xn = k at τn, we focus on the request P (τn) + k which is the request that starts to receive proactive service from τn.

If Xn+1 > 1, it means that the server is able to proactively serve request P (τn) + k before

it arrives, i.e. tP (τn)+k > τn+1. So we have the following:

(ζ X = k, X = l) =φ (A.94) n| n n+1    φ, if RP (τn)+i = 1 A   ζn Xn = k, Xn+1 = l = |  0 = 0  , if RP (τn)+i

, k 1, l > 1 (A.95) ∀ ≥ ∀

 Because P r RP (τn)+k = 1 = p independently, we have:

E ζA X = k, X = l = E [ζ X = k, X = l] p = φp, k 1, l > 1 (A.96) n | n n+1 n| n n+1 ∀ ≥ ∀

So (A.91) is proved.

Proof of (A.92): If Xn+1 = 1, which means that the request P (τn) + k arrives before it

receives φ bits of proactive service, i.e. tP (τn)+k < τn+1, we know that

 1. All the proactive work done in τn, tP (τn)+k is for request P (τn) + k;

 2. All the proactive work done in tP (τn)+k, τn+1 are for requests that are not realized.

Both of these facts are by the definition of threshold-based strategy. Because the server will keep proactively serving request P (τn) + k until it receives φ bits proactively or until it arrives, statement

(1) is true. For statement (2), if any request that has not arrived starts to be proactively served, a transition should happen at the moment it starts receiving proactive service by Definition 3. Therefore before the transition happens, i.e. τn+1, there should be no proactive service for future potential

165 APPENDIX A. APPENDIX OF CHAPTER 2

arrivals. Take what happens in (τ3, τ4) in Figure 2.4 as an example. The server starts proactively serving request 4 at τ3. In (τ3, t4), all the proactive work are done for request 4. All the proactive work in (t4, τ4) are done for the requests which are not realized.

Based on the discussions above, we have the following analysis. Consider the system starting at τ from state X = k Z+. Define a tuple of random vectors Θ (ξ , ν ), where n n ∈ n,k , n,k n,k  ξn,k , tP (τn)+1, tP (τn)+2 , . . . , tP (τn)+k denotes a random vector of the next k arrival epochs ti’s  after τn, and νn,k , RP (τn)+1 ,RP (τn)+2, ...,RP (τn)+k−1 denotes a random vector of the next k 1 R ’s after τ . A realization Θ = θ determines a set of sample paths after τ , where the − i n n,k n,k n first k arrival epochs and the realization of the first k 1 arrivals are determined. Given X = k − n  and Θn,k = θn,k, what happens in the system during τn, tP (τn)+k is deterministic. We also know

whether τn+1 > tP (τn)+k or not, which determines if Xn+1 = 1 or Xn+1 > 1, as discussed in the proof of Proposition 2.

Define Qn,k,1 as:

Q θ : X = 1,X = k (A.97) n,k,1 , { n,k n+1 n } which represents the set of sample paths under which the system transits from state k to 1 starting from τn.

Then θ Q , k Z+, n = 0, 1,... we have: ∀ n,k ∈ n,k,1 ∀ ∈

(ζ Θ = θ ,X = k) U t  Θ = θ ,X = k (A.98) n| n,k n,k n ≥ P (τn)+k P (τn)+k | n,k n,k n       UP (τn)+k tP (τn)+k Θn,k = θn,k,Xn = k , if RP (τn)+k = 1 A   | ζn Θn,k = θn,k,Xn = k = |  0 = 0  , if RP (τn)+k

(A.99)

166 APPENDIX A. APPENDIX OF CHAPTER 2

Therefore θ Q , k Z+, n = 0, 1,...: ∀ n,k ∈ n,k,1 ∀ ∈

E [ζA Θ = θ ,X = k] = U t  Θ = θ ,X = k p R n | n,k n,k n P (τn)+k P (τn)+k | n,k n,k n

E [ζ Θ = θ ,X = k]p (A.100) ≤ R n| n,k n,k n where E [ ] means expectation with respect to R . And by definition of Q , we have: R · P (τn)+k n,k,1

E ζA X = k, X = 1] n | n n+1

= E ζA X = k, Θ Q  n | n n,k ∈ n,k,1 Z A = P r Θn,k = θn,k Θn,k Qn,k,1 ER[ζn Θn,k = θn,k,Xn = k]dθn,k Qn,k,1 { | ∈ }· | Z P r Θn,k = θn,k Θn,k Qn,k,1 ER[ζn Θn,k = θn,k,Xn = k]pdθn,k ≤ Qn,k,1 { | ∈ }· |

= E [ζ X = k, X = 1] p (A.101) n| n n+1 where (A.101) is by replacing corresponding terms according to (A.100). So we have:

E ζA X = k, X = 1 E [ζ X = k, X = 1] p, k Z+ (A.102) n | n n+1 ≤ n| n n+1 ∀ ∈ so (A.92) is proved.

Proof of (A.93): The system starts from state Xn = 1. It means at time τn, no proactive work is done for any of the potential requests which have not arrived yet. Similar to the method we used to prove (A.92), we focus on the set Q in this case. Recall that θ Q if n,1,1 n,1 ∈ n,1,1 and only if Xn+1 = 1 given Xn = 1 and Θn,1 = θn,1. Notice that θn,1 = (ξ (n, 1) , ν (n, 1))  where ξ (n, 1) = tP (τn)+1 and ν (n, 1) is an empty vector, which means the arrival epoch of request P (τn) + 1 determines whether Xn+1 = 1 or not. To be specific, Xn+1 = 1 if and only if

167 APPENDIX A. APPENDIX OF CHAPTER 2

φ tP (τn)+1 < µ + τn. So:

Q = θ : X = 1,X = 1 n,1,1 { n,1 n+1 n }  φ  = θ : t < + τ (A.103) n,1 P (τn)+1 µ n

We consider another set of sample paths Qn,1,1 which is defined as:

 φ φ  Q θ : t < τ + , t t (A.104) n,1,1 , n,1 P (τn)+1 n 2µ P (τn)+2 − P (τn)+1 ≥ 2µ

By comparing (A.104) and (A.103), it is true that Qn,1,1 ( Qn,1,1.

We discuss the value of ζ under condition Θ = θ Q . If R = 1, the n n,1 n,1 ∈ n,1,1 P (τn)+1 system will proactively work on request P (τn) + 1 until tP (τn)+1. Consequently, request P (τn) + 1

φ receives fewer than 2 bits of proactive service due to the definition of Qn,1,1. If RP (τn)+1 = 0, the system will proactively work on request P (τn) + 1, until it receives φ bits from proactive service or until t . Therefore we have θ Q , n = 0, 1,...: P (τn)+2 ∀ n,1 ∈ n,1,1 ∀

(ζ Θ = θ ,X = 1) n| n,1 n,1 n     Θ = = 1 = 1  UP (τn)+1 tP (τn)+1 n,1 θn,1,Xn , if RP (τn)+1 | (A.105) ≥    φ  U t Θ = θ ,X = 1 + , if R = 0  P (τn)+1 P (τn)+1 | n,1 n,1 n 2 P (τn)+1

ζA Θ = θ ,X = 1) n | n,1 n,1 n     Θ = = 1 = 1  UP (τn)+1 tP (τn)+1 n,1 θn,1,Xn , if RP (τn)+1 = | (A.106)  0 = 0  , if RP (τn)+1

168 APPENDIX A. APPENDIX OF CHAPTER 2

And θ Q , n = 0, 1,...: ∀ n,1 ∈ n,1,1 ∀

φ (1 p) E [ζ Θ = θ ,X = 1] U t  Θ = θ ,X = 1 + − R n| n,1 n,1 n ≥ P (τn)+1 P (τn)+1 | n,1 n,1 n 2 (A.107)

E [ζA Θ = θ ,X = 1] = U t  Θ = θ ,X = 1 p (A.108) R n | n,1 n,1 n P (τn)+1 P (τn)+1 | n,1 n,1 n

So

φp (1 p) E [ζA Θ = θ ,X = 1] E [ζ Θ = θ ,X = 1]p − R n | n,1 n,1 n ≤ R n| n,1 n,1 n − 2

, θ Q , n = 0, 1,... (A.109) ∀ n,1 ∈ n,1,1 ∀

Then we have:

E ζA Θ Q ,X = 1 n | n,1 ∈ n,1,1 n Z  A = P r Θn,1 = θn,1 Θn,1 Qn,1,1 ER[ζn Θn,1 = θn,1 Qn,1,1,Xn = k]dθn,1 Qn,1,1 | ∈ · | ∈ Z  P r Θn,1 = θn,1 Θn,1 Qn,1,1 ≤ Qn,1,1 | ∈  φp (1 p) E [ζ Θ = θ Q ,X = k]p − dθ (A.110) × R n| n,1 n,1 ∈ n,1,1 n − 2 n,1 Z   = P r Θn,1 = θn,1 Θn,1 Qn,1,1 ER[ζn Θn,1 = θn,1 Qn,1,1,Xn = k]p dθn,1 Qn,1,1 | ∈ | ∈ Z    φp (1 p) P r Θn,1 = θn,1 Θn,1 Qn,1,1 − dθn,1 − Qn,1,1 | ∈ · 2 φp (1 p) = E ζ Θ Q ,X = 1 p − (A.111) n| n,1 ∈ n,1,1 n − 2

So for the set Qn,1,1:

φp (1 p) E ζA Θ Q ,X = 1 E ζ Θ Q ,X = 1 p − (A.112) n | n,1 ∈ n,1,1 n ≤ n| n,1 ∈ n,1,1 n − 2

169 APPENDIX A. APPENDIX OF CHAPTER 2

By the Law of Total Expectation, consider the set Qn,1,1 and we know that:

E [ζ Θ Q ,X = 1] n| n,1 ∈ n,1,1 n

= P r Θ Q Θ Q E ζ Θ Q ,X = 1 n,1 ∈ n,1,1| n,1 ∈ n,1,1 n| n,1 ∈ n,1,1 n

+ P r Θ Q Q  Θ Q E ζ Θ Q Q  ,X = 1 n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 n| n,1 ∈ n,1,1 \ n,1,1 n (A.113)

E ζA Θ Q ,X = 1 n | n,1 ∈ n,1,1 n

= P r Θ Q Θ Q E ζA Θ Q ,X = 1 n,1 ∈ n,1,1| n,1 ∈ n,1,1 n | n,1 ∈ n,1,1 n

+ P r Θ Q Q  Θ Q E ζA Θ Q Q  ,X = 1 n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 n | n,1 ∈ n,1,1 \ n,1,1 n (A.114) where Q Q is the set difference of Q and Q . The conditional probability n,1,1 \ n,1,1 n,1,1 n,1,1 P r Θ Q Θ Q can be calculated as follow: n,1 ∈ n,1,1| n,1 ∈ n,1,1

P r Θ Q Θ Q n,1 ∈ n,1,1| n,1 ∈ n,1,1 P r Θ Q , Θ Q = n,1 ∈ n,1,1 n,1 ∈ n,1,1 (A.115) P r Θ Q { n,1 ∈ n,1,1} P r Θ Q = n,1 ∈ n,1,1 (A.116) P r Θ Q { n,1 ∈ n,1,1} where the probabilities P r Θ Q and P r Θ Q can be derived as follow: n,1 ∈ n,1,1 { n,1 ∈ n,1,1}

170 APPENDIX A. APPENDIX OF CHAPTER 2

P r Θ Q { n,1 ∈ n,1,1}  φ = P r t τ < (A.117) P (τn)+1 − n µ

−λ φ = 1 e µ (A.118) −

P r Θ Q n,1 ∈ n,1,1  φ φ  = P r t τ < & t t > (A.119) P (τn)+1 − n 2µ P (τn)+2 − P (τn)+1 2µ  φ   φ  = P r t τ < P r t t > (A.120) P (τn)+1 − n 2µ · P (τn)+2 − P (τn)+1 2µ

 −λ φ   −λ φ  −λ φ −λ φ = 1 e 2µ e 2µ = e 2µ e µ (A.121) − −

So one can see that P r Θ Q Θ Q > 0, and n,1 ∈ n,1,1| n,1 ∈ n,1,1 P r Θ Q Q  Θ Q > 0. Equation (A.100) can be applied to θ n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 ∀ n,1 ∈   Qn,1,1 Qn,1,1 ( Qn,1,1, so we should have θn,1 Qn,1,1 Qn,1,1 : \ ∀ ∈ \

E [ζA Θ = θ ,X = 1] = U t  Θ = θ ,X = 1 R n | n,1 n,1 n P (τn)+1 P (τn)+1 | n,1 n,1 n

E [ζ Θ = θ ,X = 1]p (A.122) ≤ R n| n,1 n,1 n

θ Q Q  , n = 0, 1,... ∀ n,1 ∈ n,1,1 \ n,1,1 ∀ and consequently:

E ζA Θ Q Q  ,X = 1 E ζ Θ Q Q  ,X = 1 p n | n,1 ∈ n,1,1 \ n,1,1 n ≤ n| n,1 ∈ n,1,1 \ n,1,1 n (A.123)

By combining this equation with (A.112), we are able to compare Equation (A.113) and (A.114). We

171 APPENDIX A. APPENDIX OF CHAPTER 2 have:

E ζA Θ Q ,X = 1 n | n,1 ∈ n,1,1 n

= P r Θ Q Θ Q E ζA Θ Q ,X = 1 n,1 ∈ n,1,1| n,1 ∈ n,1,1 n | n,1 ∈ n,1,1 n

+ P r Θ Q Q  Θ Q E ζA Θ Q Q  ,X = 1 n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 n | n,1 ∈ n,1,1 \ n,1,1 n (A.124)

P r Θ Q Θ Q E ζ Θ Q ,X = 1 p ≤ n,1 ∈ n,1,1| n,1 ∈ n,1,1 n| n,1 ∈ n,1,1 n

+ P r Θ Q Q  Θ Q n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1  φp (1 p) E ζ Θ Q Q  ,X = 1 p − (A.125) × n| n,1 ∈ n,1,1 \ n,1,1 n − 2

= P r Θ Q Θ Q E ζ Θ Q ,X = 1 p n,1 ∈ n,1,1| n,1 ∈ n,1,1 n| n,1 ∈ n,1,1 n

+ P r Θ Q Q  Θ Q E ζ Θ Q Q  ,X = 1 p n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 n| n,1 ∈ n,1,1 \ n,1,1 n φp (1 p) P r Θ Q Q  Θ Q − − n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 2

< P r Θ Q Θ Q E ζ Θ Q ,X = 1 p n,1 ∈ n,1,1| n,1 ∈ n,1,1 n| n,1 ∈ n,1,1 n

+ P r Θ Q Q  Θ Q E ζ Θ Q Q  ,X = 1 p n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 n| n,1 ∈ n,1,1 \ n,1,1 n (A.126)

= E [ζ Θ = θ Q ,X = 1] p (A.127) n| n,1 n,1 ∈ n,1,1 n

Equation(A.124) is from (A.114). Equation (A.125) is from (A.112). Equation (A.126) is by removing term P r Θ Q Q  Θ Q φp(1−p) which is strictly negative. − n,1 ∈ n,1,1 \ n,1,1 | n,1 ∈ n,1,1 2 Then (A.127) is from (A.113). So we finally have:

E ζA Θ Q ,X = 1 < E [ζ Θ Q ,X = 1] p (A.128) n | n,1 ∈ n,1,1 n n| n,1 ∈ n,1,1 n where (A.93) directly follows.

172 APPENDIX A. APPENDIX OF CHAPTER 2

We have proved (A.91), (A.92) and (A.93) by now, so Lemma 3 is proved.

Lemma 3 can be interpreted as follow. In (τn, τn+1), if ζn bits of proactive service can all

 A be potentially realized, we should have E ζn = E [ζn] p based on our assumptions on the request processes. This is the case when a transition Xn+1 > 1 happens, when every bit of ζn is done before the corresponding request arrives. However if a transition Xn+1 = 1 happens, the amount of proactive work done in (τn, τn+1) that can potentially be realized is no more than ζn, leading to the inequality E ζA E [ζ ] p in this scenario. An example is shown in (τ , τ ) of Figure 2.4. The n ≤ n 3 4 0 amount of proactive work done in (τ , τ4) is for request 5 which has arrived but not realized. This

A part of proactive work will never be realized, so it is not included in ζn which causes the inequality.

Specifically if transitions Xn = 1,Xn+1 = 1 happen, we proved that strict inequality is achieved.

Intuitively if the transition to the state 1 happens comparably often as all transitions, it will be most likely that U > UA, based on Lemma 3. Then we proceed to prove Theorem 4.

Proof of Theorem 4: Define N (t) max n J (τ +) I (t) as the index of the transi- , { | n ≤ } tion where the latest actual request received proactive service. From Proposition 4, we know that the expected time before next transition is finite. So as t , we know N (t) as well. And → ∞ → ∞ t limt→∞ N(t) = E [Tn] , w.p.1 where E [Tn] is a finite constant given system parameters. Recall that

A we define ζn as the amount of proactive work done in (τn, τn+1), and ζn as the amount of proactive work done for actual requests in (τn, τn+1).

PI(t) Consider the term i=1 Ui. If we rewrite this term from the point of view of transitions, we have:

I(t) N(t) X X U = ζ o (t) (A.129) i n − i=1 n=0

173 APPENDIX A. APPENDIX OF CHAPTER 2

 where the term o (t) represents the amount of proactive work done in τN(t), τN(t)+1 for requests which arrive later than I (t). We know o (t) < ζ by definition, and we know that E[ζn] N(t) µ ≤ E [T ] < , w.p.1, so we have: n ∞

o (t) lim = 0, w.p.1 (A.130) t→∞ t

Then we have:

PI(t) U U = lim i=1 i (A.131) t→∞ I (t) PN(t) ζ o (t) = lim n=0 n − (A.132) t→∞ I (t) ! PN(t) 1 (X = 1) ζ PN(t) 1 (X > 1) ζ N (t) = lim n=0 n+1 n + n=0 n+1 n (A.133) t→∞ N (t) N (t) I (t)

PN(t) 1 (X = 1,X = 1) ζ PN(t) 1 (X > 1,X = 1) ζ = lim n=0 n n+1 n + n=0 n n+1 n t→∞ N (t) N (t) ! PN(t) 1 (X > 1) ζ N (t) + n=0 n+1 n (A.134) N (t) I (t) where (A.133) and (A.134) are by grouping the terms based on transitions. The term

PN(t) n=0 1(Xn+1=1) limt→∞ N(t) represents the limiting fraction of state 1, and the other terms in similar form can be interpreted correspondingly.

Similarly we have:

P + i∈Z :Ri=1,i≤I(t) Ui UA = lim t→∞ A (t) ! PN(t) 1 (X = 1) ζA PN(t) 1 (X > 1) ζA N (t) = lim n=0 n+1 n + n=0 n+1 n (A.135) t→∞ N (t) N (t) A (t)

PN(t) 1 (X = 1,X = 1) ζA PN(t) 1 (X > 1,X = 1) ζA = lim n=0 n n+1 n + n=0 n n+1 n t→∞ N (t) N (t) ! PN(t) 1 (X > 1) ζA N (t) + n=0 n+1 n (A.136) N (t) A (t)

174 APPENDIX A. APPENDIX OF CHAPTER 2

Case 1: If φ < U ∗, we know the Markov chain is transient. So we have

PN(t) 1 (X = 1) lim n=0 n = 0, w.p.1 (A.137) t→∞ N (t) PN(t) 1 (X > 1) lim n=0 n = 1, w.p.1 (A.138) t→∞ N (t)

Therefore based on Lemma 3 and Strong Law of Large Numbers, we have:

PN(t) 1 (X = 1) ζ PN(t) 1 (X = 1) ζ PN(t) 1 (X = 1) lim n=0 n+1 n = lim n=0 n+1 n n=0 n+1 (A.139) t→∞ N (t) t→∞ PN(t) N (t) n=0 1 (Xn+1 = 1)

= 0, w.p.1 (A.140)

PN(t) 1 (X = 1) ζA PN(t) 1 (X = 1) ζA PN(t) 1 (X = 1) lim n=0 n+1 n = lim n=0 n+1 n n=0 n+1 (A.141) t→∞ N (t) t→∞ PN(t) N (t) n=0 1 (Xn+1 = 1)

= 0, w.p.1 (A.142)

PN(t) 1 (X > 1) ζ PN(t) 1 (X > 1) ζ PN(t) 1 (X > 1) lim n=0 n+1 n = lim n=0 n+1 n n=0 n+1 t→∞ N (t) t→∞ PN(t) N (t) n=0 1 (Xn+1 > 1)

= E [ζ X > 1] 1 (A.143) n| n+1 ·

= φ, w.p.1 (A.144)

PN(t) 1 (X > 1) ζA PN(t) 1 (X > 1) ζA PN(t) 1 (X > 1) lim n=0 n+1 n = lim n=0 n+1 n n=0 n+1 t→∞ N (t) t→∞ PN(t) N (t) n=0 1 (Xn+1 > 1)

= E ζA X > 1 1 (A.145) n | n+1 ·

= φp, w.p.1 (A.146)

A(t) And limt→∞ N(t) should be the average number of actual arrivals between two consecutive transi- tions, which converges to λpE [Tn] by the Law of Large Numbers. So from (A.133) and (A.135) we

175 APPENDIX A. APPENDIX OF CHAPTER 2 have:

! PN(t) 1 (X = 1) ζ PN(t) 1 (X > 1) ζ N (t) U = lim n=0 n+1 n + n=0 n+1 n t→∞ N (t) N (t) I (t) φ = , w.p.1 (A.147) λE [Tn] PN(t) A PN(t) A ! n=0 1 (Xn+1 = 1) ζn n=0 1 (Xn+1 > 1) ζn N (t) UA = lim + (A.148) t→∞ N (t) N (t) A (t) 1 φ = φp = , w.p.1 (A.149) pλE [Tn] λE [Tn]

∗ Therefore we have U = UA, w.p.1 when φ < U .

Case 2: If φ = U ∗, the Markov chain is null recurrent. So we have

PN(t) 1 (X = 1) lim n=0 n+1 = 0, w.p.1 (A.150) t→∞ N (t) PN(t) 1 (X > 1) lim n=0 n+1 = 1, w.p.1 (A.151) t→∞ N (t)

Then the deductions are similar Case 1, so we directly show the conclusions:

φ U = , w.p.1 (A.152) λE [Tn] φ UA = , w.p.1 (A.153) λE [Tn]

∗ Therefore we have U = UA, w.p.1 when φ = U .

Case 3: If φ > U ∗, the Markov chain is positive recurrent. So the Markov chain has a limit- ing distribution, or steady state probability π , k = 1, 2,... , where π P r lim →∞ X = k , k { k } k , { n n } ∀ ∈

176 APPENDIX A. APPENDIX OF CHAPTER 2

Z+. Then we have:

PN(t) 1 (X = 1,X = 1) ζ lim n=0 n n+1 n t→∞ N (t) ! PN(t) 1 (X = 1,X = 1) ζ PN(t) 1 (X = 1,X = 1) PN(t) 1 (X = 1) = lim n=0 n n+1 n n=0 n n+1 n=0 n t→∞ PN(t) PN(t) N (t) n=0 1 (Xn = 1,Xn+1 = 1) n=0 1 (Xn = 1) (A.154)

= E [ζ X = 1,X = 1] P r X = 1 X = 1 π , w.p.1 (A.155) n| n n+1 { n+1 | n } 1 ∞ X = π pφE [ζ X = 1,X = 1] , w.p.1 (A.156) 1 k n| n n+1 k=1

PN(t) n=0 1(Xn=1,Xn+1=1)ζn In (A.154), PN(t) is the average of ζn between two consecutive transitions where n=0 1(Xn=1,Xn+1=1) PN(t) n=0 1(Xn=1,Xn+1=1) Xn = 1,Xn+1 = 1, which converges to E [ζn Xn = 1,Xn+1 = 1]. PN(t) is the | n=0 1(Xn=1) fraction of next transition where Xn+1 = 1 given Xn = 1, which converges to transition probability

PN(t) 1 P r X = 1 X = 1 . n=0 (Xn=1) is the fraction of state 1, which converges to π in positive { n+1 | n } N(t) 1 recurrent case. Therefore we have (A.155) based on the Strong Law of Large Numbers. Following similar arguments, we have the following results:

PN(t) ∞ ! n=0 1 (Xn = k, Xn+1 = 1) ζn X φ lim = πk p E [ζn Xn = k, Xn+1 = 1] , w.p.1, k > 1 t→∞ N (t) i | ∀ i=k (A.157) PN(t) n=0 1 (Xn+1 > 1) ζn lim = (1 π1) φ, w.p.1 (A.158) t→∞ N (t) − PN(t) A ∞ n=0 1 (Xn = 1,Xn+1 = 1) ζn X φ  A  lim = π1 p E ζn Xn = 1,Xn+1 = 1 , w.p.1 (A.159) t→∞ N (t) i | i=1 PN(t) A ∞ ! n=0 1 (Xn = k, Xn+1 = 1) ζn X φ  A  lim = πk p E ζn Xn = k, Xn+1 = 1 , w.p.1, k > 1 t→∞ N (t) i | ∀ i=k (A.160)

PN(t) A n=0 1 (Xn+1 > 1) ζn lim = (1 π1) φp, w.p.1 (A.161) t→∞ N (t) −

177 APPENDIX A. APPENDIX OF CHAPTER 2 based on Lemma 3 and the Strong Law of Large Numbers. So we have:

PN(t) 1 (X = 1,X = 1) ζ PN(t) 1 (X = 1,X = 1) ζ U = lim n=0 n n+1 n + n=0 n n+1 n t→∞ N (t) N (t) ! PN(t) 1 (X > 1) ζ N (t) + n=0 n+1 n (A.162) N (t) I (t) P∞ φ P∞ P∞ π1 p E [ζn Xn = 1,Xn+1 = 1] π ( p ) E [ζ X = k, X = 1] = k=1 k | + k=2 k i=k i n| n n+1 λE [Tn] λE [Tn] (1 π ) φ + − 1 λE [Tn] PN(t) A PN(t) A n=0 1 (Xn = 1,Xn+1 = 1) ζn n=0 1 (Xn = 1,Xn+1 = 1) ζn UA = lim + t→∞ N (t) N (t) ! PN(t) 1 (X > 1) ζA N (t) + n=0 n+1 n (A.163) N (t) A (t) π P∞ pφE ζA X = 1,X = 1 P∞ π (P∞ p ) E ζA X = k, X = 1 = 1 k=1 k n | n n+1 + k=2 k i=k i n | n n+1 pλE [Tn] pλE [Tn] (A.164)

(1 π ) φp + − 1 pλE [Tn] P∞ φ P∞ P∞ π1 p E [ζn Xn = 1,Xn+1 = 1] π ( p ) E [ζ X = k, X = 1] < k=1 k | + k=2 k i=k i n| n n+1 λE [Tn] λE [Tn] (1 π ) φ + − 1 (A.165) λE [Tn]

= U (A.166)

the strict inequality in (A.165) is from (A.93) of Lemma 3. Therefore we have UA < U, w.p.1 if

φ > U ∗.

φ By summarizing Cases 1, 2 and 3, the threshold-based strategy ΨP satisfies Property 2 if and only if φ U ∗. ≤

178 APPENDIX A. APPENDIX OF CHAPTER 2

A.10 Proof of Corollary 3

The UNIFORM strategy satisfies both Property 1 and Property 2 by Theorem 2, therefore

Corollary 1 can be applied. So we have :

U = U ∗, w.p.1 (A.167)

Then we select such sample paths where U = U ∗ is satisfied. For every sample path of this set, we assume that  > 0, δ > 0 such that: ∀ ∃ I(t) 1 X ∗ lim 1 (Ui < U ) = δ (A.168) t→∞ I (t) − i=1

Define sets H− (t) = i : U < U ∗ , i I (t) , i Z+  { i − ≤ ∈ } and H+ (t) = i : U U ∗ , i I (t) , i Z+ , then we have:  { i ≥ − ≤ ∈ }

PI(t) U U = lim i=1 i t→∞ I (t) X U X U = lim i + lim i (A.169) t→∞ ( ) t→∞ ( ) − I t + I t i∈H (t) i∈H (t) X H− (t) U X H+ (t) U = lim |  | i + lim |  | i (A.170) t→∞ ( ) − t→∞ ( ) + − I t H (t) + I t H (t) i∈H (t) | | i∈H (t) | | X H− (t) U ∗  X H+ (t) U ∗ lim |  | − + lim |  | (A.171) t→∞ ( ) − t→∞ ( ) + ≤ − I t H (t) + I t H (t) i∈H (t) | | i∈H (t) | |

= δ (U ∗ ) + (1 δ) U ∗ (A.172) − −

= U ∗ δ −

< U ∗ (A.173)

The reason of Equation (A.169) is by grouping all U into two sets according to if U < U ∗  or i i − not. By replacing all U by U ∗  in the group of H+ (t) where U < U ∗  and replacing all U i −  i − i

179 APPENDIX A. APPENDIX OF CHAPTER 2

∗ + ∗ by U in the group of H (t) where Ui < U , we get (A.171) from (A.170). We get (A.172) from

(A.171) based on our assumption in (A.168).

However, (A.173) contradicts the way we selected the sample path. Therefore, for this set of sample paths, we have  > 0: ∀ I(t) 1 X ∗ lim 1 (Ui < U ) = 0 (A.174) t→∞ I (t) − i=1

Therefore we have our conclusions for all the possible sample paths:

I(t) 1 X ∗ lim 1 (Ui = U ) = 1, w.p.1 (A.175) t→∞ I (t) i=1

A.11 Proof of Corollary 4

We use the Pollaczek-Khinchine formula in the analysis of M/G/1 queue in [81] to conduct this analysis. The average unfinished work in number of bits in the system by time t, which is defined as v (t), can be formulated as:    P 1 Si + + i∈Z :Ri=1,i≤I(t) SiWi 2 Si µ v (t) = (A.176) t

The corresponding terms are as shown in Figure 2.3. Si is the reactive work of actual request i, and Wi is the waiting time of the reactive part of request i when it starts to be transmitted. Define v = lim →∞ v (t) and take the limit of t of (A.176): t → ∞    P 1 Si + + i∈Z :Ri=1,i≤I(t) SiWi 2 Si µ v = lim (A.177) t→∞ t P P 2 ∈Z+ ≤ SiWi ∈Z+ ≤ S = lim i :Ri=1,i I(t) + lim i :Ri=1,i I(t) i (A.178) t→∞ t t→∞ 2tµ

180 APPENDIX A. APPENDIX OF CHAPTER 2

P Consider the term + in Equation (A.178), we have: i∈Z :Ri=1,i≤I(t) SiWi

X SiWi + i∈Z :Ri=1,i≤I(t) X X = SiWi + SiWi (A.179) + ∗ + ∗ i∈Z :Ri=1,Si>S ,i≤I(t) i∈Z :Ri=1,Si=S ,i≤I(t) + ∗ Define sets H ∗ (t) = i Z : R = 1,S = S , i I (t) S { ∈ i i ≤ } + + ∗ and H ∗ (t) = i Z : R = 1,S > S , i I (t) , then divide both sides with I (t) : S { ∈ i i ≤ } P + i∈Z :Ri=1,i≤I(t) SiWi I (t) P + S W P i∈H ∗ (t) i i ∈ ∗ SiWi = S + i HS (t) (A.180) I (t) I (t)

+ P + P ∈ SiWi ∗ SiWi HS∗ (t) i HS∗ (t) HS (t) i∈HS∗ (t) = | | + + | | (A.181) I (t) H ∗ (t) I (t) HS∗ (t) | S | | | Take limit of t on both sides and we can get: → ∞ P + SiWi lim i∈Z :Ri=1,i≤I(t) t→∞ I (t)

+ P + P ∈ SiWi ∗ SiWi HS∗ (t) i HS∗ (t) HS (t) i∈HS∗ (t) = lim | | + + lim | | (A.182) t→∞ I (t) H ∗ (t) t→∞ I (t) HS∗ (t) | S | | | I(t) P I(t) P + SiWi S W 1 X ∗ i∈HS∗ (t) 1 X ∗ i∈HS∗ (t) i i = lim 1 (Si > S ) + + lim 1 (Si = S ) t→∞ I (t) H ∗ (t) t→∞ I (t) HS∗ (t) i=1 | S | i=1 | | (A.183)

Because the network scenario we are considering is λps < µ, so all the Wi are bounded w.p.1. By

Corollary 3 we have: P + SiWi lim i∈Z :Ri=1,i≤I(t) t→∞ I (t) P P + SiWi S W i∈HS∗ (t) i∈HS∗ (t) i i = lim 0 + + lim 1 , w.p.1 (A.184) t→∞ · H ∗ (t) t→∞ · H ∗ (t) S | S | P| | i∈H ∗ (t) Wi = (S∗) lim S , w.p.1 (A.185) t→∞ HS∗ (t) P| | i∈H ∗ (t) Wi = (S∗) lim S , w.p.1 (A.186) t→∞ A (t)

181 APPENDIX A. APPENDIX OF CHAPTER 2

A(t) |HS∗ (t)| Because of Corollary 3, we have (A.184), and we have limt→∞ t = limt→∞ t w.p.1 for

∗ (A.186). The reason for (A.185) is by the definition of HS∗ (t) so we can replace all Si with S . P + Wi i∈Z :Ri=1,i≤I(t) Define w , limt→∞ A(t) and we have: P + i∈Z :Ri=1,i≤I(t) Wi w , lim t→∞ A (t) P i∈H ∗ (t) Wi = lim S , w.p.1 (A.187) t→∞ A (t)

So (A.178) can be transformed as follow:

P P 2 i∈Z+:R =1,i≤I(t) SiWi A (t) i∈Z+:R =1,i≤I(t) Si A (t) v = lim i + lim i , w.p.1 (A.188) t→∞ A (t) t t→∞ A (t) 2tµ (S∗)2λp = (S∗) wλp + , w.p.1 (A.189) 2µ

Because of the important property of a Poisson process, namely the Poisson-Arrivals-See-Time-

v Averages (PASTA)[81], we have µ = w, w.p.1, and we have:

(S∗)2λp w = , w.p.1 (A.190) 2µ (µ (S∗) λp) −

And for limiting average delay we have:

(S∗)2λp S∗ D = + , w.p.1 (A.191) 2µ (µ (S∗) λp) µ −

∗ λs−µ The following calculations can be done by replacing S with λ(1−p) .

A.12 Proof of Corollary 5

Following Equation (A.191) and Corollary 3, the delay for UNIFORM strategy is:

S∗2λp S∗ D = + , w.p.1 (A.192) U 2µ (µ S∗λp) µ −

182 APPENDIX A. APPENDIX OF CHAPTER 2

With the EDF strategy, we need to consider Equation (A.178). According to the design of the EDF strategy, the actual requests in the same busy period have the following relationship. If an actual request i is proactively served, no matter partially or fully, the corresponding waiting time satisfies Wi = 0 because all the previous potential requests have either been realized or have been fully proactively served. So we have the following results:

S = s W 0; S < s W = 0, i Z+ (A.193) i ⇒ i ≥ i ⇒ i ∀ ∈

So by reorganizing Equation (A.178):

P P 2 i∈Z+:R =1,i≤I(t) SiWi i∈Z+:R =1,i≤I(t) Si v = lim i + lim i t→∞ t t→∞ 2tµ P + SiWi = lim i∈Z :Ri=1,Si=s,i≤I(t) t→∞ t P + SiWi + lim i∈Z :Ri=1,Si

1 X 2 + lim Si (A.197) t→∞ 2 µt + i∈Z :Ri=1,i≤I(t)

Equation (A.194) is by splitting the SiWi terms into two groups according to whether Si < s or not.

Then in (A.195), we use s to replace all the S where i i : R = 1,S = s, i = 1,...,I (t) be- i ∈ { i i } cause of (A.193). Also because of (A.193) we have W = 0, i i : R = 1,S < s, i = 1,...,I (t) . i ∀ ∈ { i i }

So it would not affect the results if we replace all Si with s in this group in (A.196). Combine the

183 APPENDIX A. APPENDIX OF CHAPTER 2 terms in (A.195) and (A.196) and we have:   s X 1 X 2 v = lim  Wi + lim Si (A.198) t→∞ t→∞ 2 t + µt + i∈Z :Ri=1,i≤I(t) i∈Z :Ri=1,i≤I(t) P  + W P 2 A (t) i∈Z :Ri=1,i≤I(t) i 1 A (t) i∈Z+:R =1,i≤I(t) Si = lim s + lim i (A.199) t→∞ t A (t) t→∞ 2µ t A (t) λp = λpsw + S2 , w.p.1 (A.200) E 2µ E

P + Wi i∈Z :Ri=1,i≤I(t) where wE , limt→∞ A(t) is the limiting average of waiting time for each actual request under the EDF strategy, and SE is the reactive work of requests under the EDF strategy.

Also due to PASTA, we have:

λpS2 w = E , w.p.1 (A.201) E 2µ (µ λps) −

∗ Notice here, s is the original object size without any proactive work. We have SE > S due to

2 Theorem 4 and Corollary 1, then S2 S  > S∗2, so we have: E ≥ E

λpS2 S D = E + E , w.p.1 E 2 (µ2 µλps) µ − λpS∗2 S∗ + , w.p.1 ≥ 2 (µ2 µλpS∗) µ −

= DU (A.202) where equality holds if and only if p = 0.

184 Appendix B

Appendix of Chapter 3

B.1 Relationship between the Genie-Aided system and Realistic Proac-

tive system

First, we have the following lemma which describes the fundamental relationship between the Genie-Aided system and the Realistic Proactive system.

Lemma 4. E[P (t) I (t)] p , t > 0 − ≤ 1−p ∀

Recall I (t) max i t < t, R = 1 . Therefore, P (t) I(t) equals the number of , { | i i } − consecutive unrealized requests from I (t) + 1 to P (t). The inequality in Lemma 4 is because the system starts from t = 0 so P (t) I(t) is upper bounded by P (t) P (0). Since the cal- − − culations are straight-forward, we omit the proof for this lemma to save space. The statement lim →∞ (P (t) I(t)) < , w.p.1 is also correct based on Lemma 4. t − ∞ We can verify that the results in Section 3.3 and 3.4 are still correct under Realistic

Proactive system with Lemma 4. Such results include the definitions of service model ΓP , proactive

185 APPENDIX B. APPENDIX OF CHAPTER 3

strategy ΨP , Property 1 and 2, and most importantly, Theorem 5 and Corollary 6. For example, U in (3.9) is the limiting fraction of potential requests that is finished proactively. If we consider the

Realistic Proactive system, we have:

PI(t) i=1 Ui U , lim t→∞ I (t) PP (t) PP (t) i=1 Ui Ui P (t) = lim − i=I(t) t→∞ P (t) I(t) P (t) PP (t) P U P (t) Ui = lim i=1 i lim i=I(t) t→∞ P (t) I(t) − t→∞ I(t) PP (t) U = lim i=1 i , w.p.1 t→∞ P (t)

In similar ways, the aforementioned definitions and conclusions can all be transformed using I(t), and we can get corresponding definitions and conclusions under the Realistic Proactive System.

B.2 Proof of Theorem 5

First we make the following definitions similar to (3.15), (3.16) and (3.17):

The amount of time that ΨR works in idle state (namely Reactive Idle) from 0 to t is:

T (t) τ (0, t]: N (τ) = 0 (B.1) RI , | { ∈ } |

The amount of time that ΨR works in busy state (namely Reactive Busy) from 0 to t is:

T (t) τ (0, t]: N (τ) > 0 (B.2) RB , | { ∈ } |

The limiting fraction of time that ΨR works in idle state is:

TRI (t) αRI , lim (B.3) t→∞ t

186 APPENDIX B. APPENDIX OF CHAPTER 3

The limiting fraction of time that ΨR works in busy state is:

TRB (t) αRB , lim (B.4) t→∞ t

In this proof, we first compare the reactive scheme with the Realistic Proactive scheme under the same sample path. Define the system state at time t as ”Proactive Served”, if ΨR works in busy state at time t, and ΨP works in proactive state at time t. The amount of time that ΨP works in

”Proactive Served” state is:

T (t) τ (0, t]: V (τ) = 0,V (τ) > 0 (B.5) PS , | { ∈ P R } | where VP (t) is the unfinished work in proactive scheme at t and VR (t) is the unfinished work in reactive scheme at t. One can check Figure 3.3 for a more straightforward understanding of these definitions.

Based on the definitions, we have:

TRB (t) + TRI (t) = TPR (t) + TPP (t) = t (B.6)

An important fact to be noticed here is that:

TPP (t) = TPS (t) + TRI (t) (B.7)

Then by (B.5):

X TPS (t) = Ui (t) di (B.8) + i∈Z :Ri=1,i≤I(t) P where the term + ( ) is the total service time of all the actual requests which i∈Z :Ri=1,i≤I(t) Ui t di has been proactively served before arrival in (0, t).

187 APPENDIX B. APPENDIX OF CHAPTER 3

Next, the total time that the server works proactively by t, defined as TPP (t), satisfies the follow equation:

P∞ U (t) lim i=1 i = µ, w.p.1 (B.9) t→∞ TPP (t)

P∞ i=1 Ui(t) The term lim →∞ can be interpreted as the limiting average number of proactive service t TPP (t) done per unit time. Therefore, (B.9) is achieved based on the property the service process where the intervals follow IID Exp(µ). Following (B.9), we have:

P∞ U (t)d lim i=1 i i = 1, w.p.1 (B.10) t→∞ TPP (t)

because the reactive service time d follows IID Exp(µ). Equivalently, we have: { i}

T (t) P∞ U (t)d lim PP = lim i=1 i i , w.p.1 (B.11) t→∞ t t→∞ t

Next we have:

P P + Ui(t)di A(t) i∈Z+:R =1,i≤I(t) Ui(t)di i∈Z :Ri=1,i≤I(t) i lim →∞ limt→∞ t t = t A(t) PI(t) PI(t) i=1 Ui(t)di I(t) i=1 Ui(t)di limt→∞ t limt→∞ t I(t) A(t) 1 limt→∞ UA = t µ (B.12) I(t) 1 limt→∞ t U µ λp , w.p.1 (B.13) ≤ λ

= p, w.p.1 (B.14)

with equality in (B.13) if and only if the strategy ΨP satisfies Property 2 based on Proposition 5.

1 Equation (B.12) is because the service time of each request is IID with expectation µ . Following

(B.14), we have:

P PI(t) i∈Z+:R =1,i≤I(t) Ui (t) di U (t) d lim i p lim i=1 i i , w.p.1 (B.15) t→∞ t ≤ t→∞ t

188 APPENDIX B. APPENDIX OF CHAPTER 3 with equality if and only if Property 2 is satisfied.

Then based on Equation (B.8) , (B.10), and (B.15), we have:   PI(t) ( ) + P∞ ( ) T (t) (B.10) i=1 Ui t di i=I(t)+1 Ui t di p lim PP p = lim t→∞ t t→∞ t PI(t) P∞ U (t) d i=I(t)+1 Ui (t) di = lim i=1 i i p + lim p (B.16) t→∞ t t→∞ t P (B.15) limt→∞ i∈Z+:R =1,i≤I(t) Ui (t) di i , w.p.1 (B.17) ≥ t (B.8) T (t) = lim PS , w.p.1 (B.18) t→∞ t with equality in (B.17) if and only if the strategy satisfies both Property 1 and Property 2. So if we put Equation (B.7) over t and take t , we have: → ∞ T (t) T (t) T (t) lim PP = lim PS + lim RI t→∞ t t→∞ t t→∞ t T (t) T (t) lim PP p + lim RI , w.p.1 (B.19) ≤ t→∞ t t→∞ t T (t) T (t) (1 p) lim PP lim RI , w.p.1 (B.20) − t→∞ t ≤ t→∞ t where (B.19) is from (B.18).

By replacing corresponding terms in Equation (B.20) with Equation (B.3) and (3.17), we have:

α α RI , w.p.1 (B.21) PP ≤ 1 p − and we know from fundamental queueing theory that:

λps α = 1 , w.p.1 (B.22) RI − µ

Then we have the result:

µ λp α − , w.p.1 (B.23) PP ≤ µ (1 p) −

189 APPENDIX B. APPENDIX OF CHAPTER 3

And it follows that:

λp µp α = 1 α − , w.p.1 (B.24) PR − PP ≥ µ (1 p) − with equality in (B.23) and (B.24) if and only if the strategy satisfies both Property 1 and Property 2.

The proof for the Genie-Aided proactive system is very similar, so we omit the details of the proof.

B.3 Proof of Corollary 6

From Equation (3.17), we know that:

TPP (t) αPP = lim t→∞ t P∞ U (t)d = lim i=1 i i (B.25) t→∞ t P (t) P∞ U (t)d = lim i=1 i i t→∞ t P (t) λ = U, w.p.1 (B.26) µ

P (t) 1 (B.25) is from (B.11). (B.26) is because limt→∞ t = λ, w.p.1 and the expected service time of µ .

Then from Theorem 5, we get:

µ µ λp U − , w.p.1 ≤ λ µ (1 p) − µ λp = − , w.p.1 (B.27) λ (1 p) − with equality if and only if both Property 1 and Property 2 are satisfied.

190 APPENDIX B. APPENDIX OF CHAPTER 3

B.4 Proof of Lemma 2

+ The event M = m, m Z means there are m requests in Π(τk) = (P (τ )+1,P (τ )+ k ∈ k k

2,...) that are proactive served at t = τk. Because we assumed that service time follow IID

+ Exp(µ), P M = m, m Z is independent of which m services in Π(τk) are proactively { k ∈ } served. Therefore, the probability that the next potential request, i.e., P (τk)+1, has been proactively  served depends purely on P CP (τk)+1 = 1 = φ.

B.5 Proof of Proposition 8

Recall that Property 1 is

P∞ i=P (t)+1 Ui (t) lim = 0, w.p.1 (B.28) t→∞ t

P∞ where i=P (t)+1 Ui (t) = M(t) by definition. Therefore, Property 1 can be rewritten as:

M(t) lim = 0, w.p.1 (B.29) t→∞ t

A relationship between the Markov process and the embedded Markov Chain is that:

M(t) M(τ ) M M M lim = lim K = lim K = lim K = lim K (λ + µ) , w.p.1 t→∞ →∞ →∞ PK−1 →∞ PK−1 ∆ →∞ t K τK K ∆k K k=0 k K K k=0 K K (B.30)

By combining (B.29) and (B.30), a sufficient and necessary for Property 1 is:

M lim k = 0, w.p.1 (B.31) k→∞ k

µ−λp When φ > λ(1−p) , the Markov Chain is positive recurrent so we have limk→∞ Mk <

, w.p.1. Therefore we have (B.29) satisfied with probability 1. ∞ 191 APPENDIX B. APPENDIX OF CHAPTER 3

PK  n µ−λp k=0 1(Nk=n) µ−λ2 λ2 When φ , we know that lim →∞ = , n = 0, 1, . . . , w.p.1 ≤ λ(1−p) K K µ µ ∀

Mk based on Proposition 7. Suppose we start with (M0,N0) = (0, 0), and we can rewrite limk→∞ k as:

M Pk−1 (M M ) lim k = lim i=0 i+1 − i (B.32) k→∞ k k→∞ k P P + (Mi+1 Mi) (Mi+1 Mi) = lim i

P M = M 1 M Z+ , w.p.1 (B.36) − i+1 i − | i ∈ µ λ = − 2 µ λ , w.p.1 (B.37) µ − 3

= µ λ λ , w.p.1 (B.38) − 2 − 3

(B.32) is by replacing Mk by the summation of one-step difference in relation to the transitions of the

2D Markov Chain. (B.33) is by grouping the one-step transitions into two groups based on whether the P (Mi+1−Mi) i

192 APPENDIX B. APPENDIX OF CHAPTER 3 probability, based on Strong Law of Large Numbers. Then by Proposition 7, we get (B.37).

Mk µ−λp Based on (B.38), we get the following conclusions: limk→∞ k = 0 if φ = λ(1−p)

Mk µ−λp and limk→∞ k > 0 if φ < λ(1−p) . Based on (B.29), FIXP(φ) satisfies Property 1 if and only if

φ µ−λp . ≥ λ(1−p)

B.6 Verification of Theorem 5 under FIXP strategies in Genie-Aided

system

Theorem 5 states that the limiting fraction of time the system works proactively, i.e., αPP , is maximized if and only if both Property 1 and Property 2 are satisfied. In the Genie-Aided system, we proved that both properties are satisfied if and only if φ µ−λp . In the Markov process model, ≥ λ(1−p) the system works proactively when N(t) = 0. Because the transition holding time follows the same distribution according to (3.23), the fraction of time that the system works proactively simply depends on the limiting fraction of occurrences of states with N(t) = 0.

µ−λp When φ > λ(1−p) :

PK−1 G k=0 (1(Nk = 0)(∆k)) αPP (φ) = lim (B.39) K→∞ τK PK−1 PK−1 k=0 (1(Nk=0)(∆k)) k=0 (1(Nk=0)) limK→∞ PK−1 K k=0 (1(Nk=0)) = PK−1 (B.40) k=0 ∆k limK→∞ K

PK−1 k=0 (1(Nk=0)(∆k)) The term limK→∞ PK−1 is the limiting average holding time between transitions k=0 (1(Nk=0)) conditioned on Nk = 0. Since the embedded Markov Chain is positive recurrent, the states of

Nk = 0 are visited infinitely often, so based on (3.24), we have:

PK−1 (1(N = 0)∆ ) PK−1 ∆ 1 lim k=0 k k = lim k=0 k = , w.p.1 (B.41) K→∞ PK−1 K→∞ K λ + µ k=0 (1(Nk = 0))

193 APPENDIX B. APPENDIX OF CHAPTER 3

PK−1 k=0 (1(Nk=0)) by Law of Large Numbers. The term limK→∞ K is the limiting fraction of transitions that starts from states with Nk = 0. Therefore, we have:

PK−1 ∞ k=0 (1(Nk = 0)) X lim = π(m,0), w.p.1 (B.42) K→∞ K m=0 according to the steady state distribution. By combining (B.41) and (B.42), we get:

∞ X (B.40) = π(m,0), w.p.1 (B.43) m=0 1 = π , w.p.1 (B.44) (0,0) 1 µ − λ2+λ3  µ  1 = 1 + , w.p.1 λ +λ µ λ λ +λ λ µ µλ2 λ +λ λ2 + λ3 µ 2 3 + 1 2 1 3 2 3 − λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ3 (B.45)

(µ λp)(µ λp λφ + λpφ) λp λpφ + λφ = − − − − , w.p.1 (B.46) λµ(p 1)(p + φ pφ) λp λpφ + λφ µ − − − − µ λp = − , w.p.1 (B.47) µ(1 p) − µ−λp which verifies the conclusion for φ > λ(1−p) .

When φ µ−λp , we have similar derivations from (B.40) and (B.41). Therefore, we ≤ λ(1−p) G directly write the relationship between αPP and the limiting distribution of the embedded Markov

Chain as:

PK G k=0 1 (Nk = 0) αPP (φ) = lim (B.48) K→∞ K µ λ = − 2 (B.49) µ µ λp + λpφ = − (B.50) µ µ−λp µ λp + λp − − λ(1 p) (B.51) ≤ µ µ λp − (B.52) ≤ µ(1 p) −

194 APPENDIX B. APPENDIX OF CHAPTER 3

µ−λp with equality if and only if φ = λ(1−p) . Therefore, Theorem 5 and Theorem 6 is verified in the

Genie-Aided system.

B.7 Proof of Proposition 7

Consider a walk that starts at the first column in the 2D Markov Chain and ends right before it revisits the first column for the first time, which can be described as:

2D  h h  h h  h h  h h  Wh = M0 ,N0 , M1 ,N1 , M2 ,N2 ,..., ML(h),NL(h) ,

N h = 0,N h = 0,N h = 0, k = 1, 2,...,L(h), h = 0, 1,... 0 L(h)+1 k 6 where h is the index of a walk.

We consider a corresponding walk in the 1D Markov Chain starting from Xk = 0 with the following rules:

If the walk in the 2D Markov Chain walks left (or right), i.e., N = N 1 (or N = N +1), • k+1 k − k+1 k the walk in the 1D Markov Chain also walks left (or right), i.e., X = X 1 (or X = k+1 k − k+1

Xk + 1);

If the walk in the 2D Markov Chain walks up, down, or self-loop, i.e., N = N , the walk in • k+1 k

the 1D Markov Chain walks in a self-loop, i.e., Xk+1 = Xk.

In this way, all the possible walks in the 2D Markov Chain are mapped to 1D Markov Chain, and it forms a surjective relation. We can describe the walk in 1D Markov Chain as:

1D  h h h h  h h Wh = X0 ,X1 ,X2 ,...,XL(h) ,Xk = Nk , k = 0, 1,...,L(h), h = 0, 1,...

195 APPENDIX B. APPENDIX OF CHAPTER 3

One fact is that the expectation of the length of each walk is finite, i.e., E[L(h) + 1] < , given ∞

λ2 < λ1 < µ.

By comparing the walks in the two Markov Chains, we have the following observations.

1. If the walk in the 2D Markov Chain does not visit the first row, the transition probabilities of

the two walks are exactly the same:

P W 2D = P W 1D , if M h = 0, k = 0, 1,...,L(h) (B.53) h h k 6 ∀

2. If the walk in the 2D Markov Chain visits the first row, the horizontal transition probabilities

in the 2D Markov Chain is different from in the 1D Markov Chain, causing the walks to be

with different probabilities.

Next, we are going to show that a walk in the 2D Markov Chain satisfies condition (1) with probability 1 when φ µ−λp . Based on our definition of W , we can get a sequence of walks in ≤ λ(1−p) 2D 2D 2D 2D 2D the 2D Markov Chain starting from t = 0, denoted by W0 ,W1 ,W2 ,...,WH . We define the following set of walks:

n o Γ (H) = W 2D; M h = 0, k = 0, 1,...,L(h), h H (B.54) h k 6 ∀ ≤ which represents the set of walks in the 2D Markov Chain which do not visit the first row. Then we

first prove the following lemma:

Lemma 5. If φ µ−λp , ≤ λ(1−p)

PH 1 W 2D Γ (H) lim h=0 h ∈ = 1, w.p.1 (B.55) H→∞ H

196 APPENDIX B. APPENDIX OF CHAPTER 3

Proof. From Section 3.5.2.1, we know πr does not have a non-trivial solutions when φ µ−λp . { m} ≤ λ(1−p) As a result, we can see that:

PK 1 (M = 0) lim k=0 k = 0, w.p.1 (B.56) K→∞ K

A simple verification is that if the limiting fraction is positive, then by (3.43) and [59], the steady-state

r distribution exists and π0 is positive. As a result, all the states in this irreducible Markov Chain will be positive recurrent according to [59], which contradicts our results in Section 3.5.2.1. We can derive the relationship between (B.56) and (B.55) with the following derivation:

PK 1 (M = 0) PH PL(h) 1 (M = 0) lim k=0 k = lim h=0 k=0 k (B.57) K→∞ K H→∞ PH h=0 (L(h) + 1) PH 1 W 2D / Γ (H) lim h=0 h ∈ (B.58) ≥ H→∞ PH h=0 (L(h) + 1) PH 2D h=0 1(Wh ∈/Γ(H)) H = lim PH H→∞ h=0(L(h)+1) H PH 1 W 2D / Γ (H) 1 = lim h=0 h ∈ , w.p.1 (B.59) H→∞ H E [L(h) + 1] ! PH 1 W 2D Γ (H) 1 = 1 lim h=0 h ∈ , w.p.1 − H→∞ H E [L(h) + 1]

(B.58) is because one each path which revisit row 0, it has at least one state in row 0. (B.59) is by

PH h=0(L(h)+1) replacing limH→∞ H with E [L(h) + 1] according to Strong Law of Large Numbers. So we have:

! PH 1 W 2D Γ (H) 1 1 lim h=0 h ∈ = 0, w.p.1 − H→∞ H E [L(h) + 1] PH 1 W 2D Γ (H) lim h=0 h ∈ = 1, w.p.1 H→∞ H and we have our result.

197 APPENDIX B. APPENDIX OF CHAPTER 3

Lemma 5 means the walks in the 2D Markov Chain does not visit the first row with probability 1 if φ µ−λp . Based on (B.53), we can match almost all the walks in the 2D Markov ≤ λ(1−p) Chain with corresponding walks in the 1D Markov Chain with exactly the same probability. As a result, the limiting distribution of the 1D Markov Chain shows the limiting distribution of Nk in the

2D Markov Chain.

B.8 Proof of Theorem 7

In order to derive the closed-form expression for the limiting average delay, we first derive the average number of reactive services in the system, i.e., N(t), as follows:

PK−1 N ∆ N(t) = lim k=0 k k (B.60) K→∞ PK−1 k=0 ∆k P∞ PK−1  n=0 k=0 1 (Nk = n) n∆k = lim (B.61) K→∞ PK−1 k=0 ∆k P∞ PK−1 n=0( k=0 1(Nk=n)n∆k) limK→∞ K = PK−1 (B.62) k=0 ∆k limK→∞ K

 PK−1 PK−1  P∞ k=0 1(Nk=n) k=0 1(Nk=n)∆k n=0 limK→∞ K PK−1 n k=0 1(Nk=n) = PK−1 (B.63) k=0 ∆k limK→∞ K ∞ − ! X PK 1 1 (N = n) = lim k=0 k n (B.64) K→∞ K n=0

(B.60) is by the definition of the limiting average reactive services in the system. (B.61) is by categoriz- ing each transition in to groups based on how many services are in the system. (B.62) is valid because

PK−1 PK−1 k=0 ∆k k=0 ∆k the limit of limK→∞ K exists because of (3.24). In (B.63), the term limK→∞ K is the average holding time conditioned on Nk = n. Because of Proposition 6 and Proposition 7, we

PK−1 k=0 ∆k know the limiting average exists, and it equals to limK→∞ K because of (3.24). The result

198 APPENDIX B. APPENDIX OF CHAPTER 3 in (B.64) shows that the average number of reactive services in the system depends on the limiting distribution of the embedded Markov Chain of the Markov process.

Based on N(t), we can derive the limiting average delay using the Little’s Law.

µ−λp Delay Analysis of FIXP strategies when φ > λ(1−p) in Genie-Aided system When

µ−λp φ > λ(1−p) , the Markov Chain is positive recurrent according to Proposition 6. To continue from

(B.64), we have:

X X N(t) = nπ(m,n), w.p.1 (B.65) n=0,1,... m=0,1,...

Let D denote the average time a job spends in the system by applying Little’s Law and get:

P P N(t) nπ(m,n) D = = m=0,1,... n=0,1,... (B.66) λp λp

(B.67)

We omit the calculations and directly show the results:

λp3 µp2 λp3φ + µp2φ + λpφ µpφ µ λp D = − − − , φ > − (B.68) λφp(µ λp)(1 p) λ (1 p) − − − ∂D p µ λp = > 0, φ > − (B.69) ∂φ λφ2(1 p) λ (1 p) − −

Notice that we are considering the regime that 0 < p < µ 1, so D in (B.68) is always positive. λ ≤ µ−λp (B.69) shows that delay is monotonically increasing with φ when φ > λ(1−p) .

Delay Analysis of FIXP strategies when φ µ−λp in Genie-Aided system ≤ λ(1−p) Based on Proposition 7, we are able to analyze the average delay D based on the 1D

Markov Chain. Similarly, we can compute D from (B.64) and get:

P nπn D = n=0,1,... (B.70) λp

199 APPENDIX B. APPENDIX OF CHAPTER 3

where πn denotes the steady-state probability of state n in the 1D Markov Chain. Again, we directly show the results as follow:

1 φ µ λp D = − , φ − (B.71) µ λp (1 φ) ≤ λ (1 p) − − − ∂D µ µ λp = < 0, φ − (B.72) ∂φ −(µ λp (1 φ))2 ≤ λ (1 p) − − − (B.72) shows that D is monotonically decreasing when φ µ−λp . ≤ λ(1−p) By combining (B.68-B.69) and (B.71-B.72), we achieve the results in Theorem 7.

B.9 Proof of Theorem 8

We first discuss the influence of parameters ak, bk, and ck in the embedded chain. First of all, the states of (0, 0)A and (0, 0)B are not absorbing states because the transition rates from (0, 0)A and (0, 0)B to (0, 1) is λ1 > 0. Therefore, these states will not change the recurrence of the Markov

Chain. Next, we provide a more detailed discussion about the range of ak, bk, and ck as follow.

a : The transition from (0, 0) to (0, 0) only happens when an unrealized potential arrival • k A B happens, so we have a λ (1 p)λ. On the other hand, we know that J(τ ) > P (τ ) in state k ≤ − k k

(0, 0)A by definition. When J(τk) > P (τk + 1) + 1, ak = 0 because when a potential request

arrives, it transits to state (0, 1) if it is realized, and a self-loop happens if it is unrealized because

the ongoing proactive service is still for future. When J(τk) = P (τk + 1) + 1, the rate that an

unrealized potential arrival happens before the ongoing proactive service finishes is (1 p)λ. − Because P r J(τ ) = P (τ + 1) + 1 θ = (0, 0) = φ, we have a = 1 p with probability { k k | k A} k − φ.

200 APPENDIX B. APPENDIX OF CHAPTER 3

b : In state (0, 0) , J(τ −) P (τ −) before the transition to (0, 0) happens. Because E[P (t) • k B k ≤ k A − I (t)] p , t > 0, as proved in Appendix B.1, we know that P (τ −) J(τ −) < P (τ −) ≤ 1−p ∀ k − k k − I(τ −) < , w.p.1. As a result, there is always a positive probability that the next service to k ∞ + + proactively work on satisfies J(τk ) > P (τk ). So we have bk > 0.

c : It is sufficient to prove our desired results with c [0, 1], which is a trivial range. • k k ∈

As we mentioned before, the transitions not involving ak, bk, or ck are still Markovian.

Therefore, the limiting fractions of the states should satisfy relationships similar to (3.49-3.52) if the solution is non-trivial:

(λ + µ) f = λ f + µf + λ f , n Z+, w.p.1 1 (0,n) 1 (0,n−1) (0,n+1) 3 (1,n) ∈ (B.73)

(λ + λ + µ) f = µf + µf + λ f , w.p.1 2 3 (1,0) (0,0)A (1,1) 3 (2,0)

(B.74)

(λ2 + λ3 + µ) f(m,0) = µf(m−1,0) + µf(m,1) + λ3f(m+1,0), m = 2, 3, . . . , w.p.1

(B.75)

(λ + λ + µ) f = λ f + µf + λ f , m Z+, n Z+, w.p.1 2 3 (m,n) 2 (m,n−1) (m,n+1) 3 (m+1,n) ∈ ∈ (B.76) where (λ1 + µ) f(0,1) = λ1f(0,0) + µf(0,2) + λ3f(1,1) is from the fact (0, 1) has an average transition rate of µ to state (0, 0), and (0, 0) has an average transition rate of λ1 to (0, 1). Suppose we know

f f , f , and A0 = (0,0)A , we can express all the limiting fraction of time of each states in the (0,0) (0,0)A f(0,0)

201 APPENDIX B. APPENDIX OF CHAPTER 3 following form:

 µλ  λ n µλ  λ n f = 1 + A0 2 1 A0 2 2 f , (0,n) λ λ + λ λ µλ µ − λ λ + λ λ µλ λ + λ (0,0) 1 2 1 3 − 2 1 2 1 3 − 2 2 3 n = 0, 1, . . . , w.p.1 (B.77)

 µ m  λ n f = 2 f , m Z+, n = 0, 1, . . . , w.p.1 (B.78) (m,n) (0,0)A λ2 + λ3 λ2 + λ3 ∈ which are in similar formats as (3.55) and (3.56). We solve this set of equations with constraint

P µ−λp θ∈Θ fθ = 1, and we find out that if and only if φ > λ(1−p) , there is a set of non-trivial solutions:

1 f(0,0) =  , w.p.1 µ + A0 λ2+λ3 µ + µλ2 µ µλ2 λ2+λ3 µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ2 (B.79)

When φ µ−λp , we have f = 0, θ Θ. ≤ λ(1−p) θ ∀ ∈ 0 µ−λp 0 Next, we focus on the value of A when φ > λ(1−p) . We will show that f(0,0)B > given

f(0,0)A > 0 and vice versa.

On one hand, we consider the case that f(0,0)A > 0 and try to prove f(0,0)B > 0. Based on our discussions on a , we know that a = 1 p with probability φ. The scenario happens when 1) k k − the system is proactively working on the first future request, 2) the request arrives in the potential process before the proactive service finishes, and 3)it is unrealized. Therefore, we have:

λφ(1 p) f − f , w.p.1 (B.80) (0,0)B ≥ λ + µ (0,0)A

f based on Strong Law of Large Numbers. Then we have A0 = (0,0)A < 1 with probability 1. f(0,0)A +f(0,0)B

On the other hand, we consider the case when f(0,0)B > 0 and try to prove f(0,0)A > 0 by

contradiction. We assume f(0,0)A = 0 when f(0,0)B > 0. Our idea is to show that every time the state is in (0, 0)B, the transition probability into state (0, 0)A is lower bounded away from 0 by a positive constant.

202 APPENDIX B. APPENDIX OF CHAPTER 3

Given θ = (0, 0) , we know that J(τ +) P (τ +) by definition. With probability µ , k B k ≤ k µ+λ + the proactive service J(τk ) finishes before any potential request arrives. Under this circumstance,

+ the probability that the next proactive service J(τk+1) is for a future request satisfies:

P r J(τ + ) > P (τ +) J(τ +) P (τ +)& P (τ +) = P (τ + ) k+1 k | k ≤ k k k+1

P r J(τ + ) > P (τ +) J(τ +) = I(τ +) + 1 & P (τ +) = P (τ + ) (B.81) ≥ k+1 k | k k k k+1

P (τ +)−I(τ +) = (1 φ) k k (B.82) −

Based on the definition of J(t), we have J(τ +) I(τ +) + 1. Because of the orderliness of k ≥ k choosing the next proactive service, we have (B.81). Then the probability that the next proactive service is for a future request, i.e., J(τ + ) > P (τ +), equals the probability that C = 0, i = k+1 k i ∀ + + I(τk ) + 1,...,P (τk ). So we have (B.82). Then based on the definition of bk, we have:

b = P r J(τ + ) > P (τ +) J(τ +) P (τ +)& P (τ +) = P (τ + ) k k+1 k | k ≤ k k k+1

Then based on (B.82), we have:

+ + P (τ +)−I(τ +) b θ = (0, 0) ,P (τ ),I(τ ) (1 φ) k k (B.83) k| k B k k ≥ −

+ + which is conditioned on P (τk ) and I(τk ). Based on Lemma 4, we know that the unconditional distribution of P (t) I(t) has a finite expectation. Given that f > 0, P (τ +) I(τ +) θ = − (0,0)B k − k | k

(0, 0)B should also have a finite expectation, if it exists. Otherwise, a contradiction can directly be found by using total expectation theorem. Here we assume the expectation exists, and we have:

PK + + k=0(P (τk ) I(τk ))1(θk = (0, 0)B) lim − , G < (B.84) K→∞ PK ∞ k=0 1(θk = (0, 0)B)

+ + which represents the empirical gap between P (τk ) and I(τk ) in state (0, 0)B.

203 APPENDIX B. APPENDIX OF CHAPTER 3

Then the transition from (0, 0)B to (0, 0)A can be characterized by:

µ E [1(θ = (0, 0) ) θ = (0, 0) ] = (b θ = (0, 0) ) (B.85) k+1 A | k B µ + λ k| k B which means that each time the state is in (0, 0)B, the transition probability into state (0, 0)A depends on b , which can be lower bounded by the gap of P (τ +) I(τ +). Because we know the expected k k − k gap of P (τ +) I(τ +) is finite in state (0, 0) from (B.84), we have the following: k − k B PK PK k=0 1(θk = (0, 0)A) k=0 1(θk+1 = (0, 0)A, θk = (0, 0)B) f(0,0) = lim lim (B.86) A K→∞ K ≥ K→∞ K PK 1(θ = (0, 0) )E [1(θ = (0, 0) ) θ = (0, 0) )] = lim k=0 k B k+1 A | k B , w.p.1 (B.87) K→∞ K PK µ 1(θk = (0, 0)B) (bk θk = (0, 0)B) = lim k=0 µ+λ | (B.88) K→∞ K PK µ P (τ +)−I(τ +) 1(θk = (0, 0)B) ((1 φ) k k θk = (0, 0)B) lim k=0 µ+λ − | (B.89) ≥ K→∞ K PK µ P (τ +)−I(τ +) 1(θk = (0, 0)B) ((1 φ) k k θk = (0, 0)B) lim k=0 µ+λ − | ≥ K→∞ PK k=0 1(θk = (0, 0)B) PK 1(θ = (0, 0) ) lim k=0 k B (B.90) × K→∞ K PK (P (τ+)−I(τ+))1(θ =(0,0) ) k=0 k k k B limK→∞ µ PK 1(θ =(0,0) ) f (1 φ) k=0 k B (B.91) ≥ µ + λ (0,0)B − µ > f (1 φ)G (B.92) µ + λ (0,0)B −

The inequality in (B.86) is because there can be transitions into state (0, 0)A from other states like

(1, 0) and (0, 1). (B.87) is by describing the transitions from (0, 0)B to (0, 0)A. (B.88) is from (B.85), and (B.89) is from (B.83). (B.90) reformulate the equation into the conditional term in state (0, 0)B, and the limiting fraction of (0, 0)B in the embedded chain. The inequality in (B.91) is because the function f(x) = (1 φ)x is convex. By applying (B.84), we derive the lower bound for f in − (0,0)A the format of (0, 0)B multiplied by a positive constant factor, which contradicts our assumption of

0 f(0,0)A = 0. Therefore, we have f(0,0)A > 0 if f(0,0)B > 0. Consequently, A > 0.

204 APPENDIX B. APPENDIX OF CHAPTER 3

As a summary, we have A0 (0, 1). ∈

B.10 Proof of Proposition 10

Based on the embedded chain of the Realistic Proactive system, the system works proac- tively in the states of (0, 0)A, (0, 0)B, (1, 0), .... The transitions can be classified as follows:

Transitions originating from (0, 0) , (1, 0), (2, 0) ...: each time such transitions happen, one • A µ µ future request is proactively served with probability µ+λ , where µ+λ is the probability that service

finishes before a potential arrival happens. This is according to how we defined the transitions.

Transitions originating from (0, 0) : each time such transitions happen, one unrealized request • B µ which has arrived is proactively served with probability µ+λ by definition.

PK µ Therefore, the expected number of proactive service done by τK is k=0 1(Nk = 0) λ+µ , and the expected number of proactive service done for actual requests by τ is PK 1(N = 0, θ = K k=0 k k 6 µ (0, 0)B) λ+µ p.

If we pick t0 = t , we have P (t0) = J(t). Then we let K = max k τ < p0 and have J(t) { | k } the following:

PP (t0) Ui 0 0 i=0 0 PP (t ) U limt →∞ 0 A(t ) U = P (t ) = lim i=0 i (B.93) P U 0 0 P i∈Z+:R =1,t

205 APPENDIX B. APPENDIX OF CHAPTER 3

µ−λp 0 When φ > λ(1−p) , we know that f(0,0)B > and we have: P U θ ∈{(0,0) ,(0,0) ,(1,0),(2,0),...} πθk = k A B > 1 (B.95) P π UA θk∈{(0,0)A,(1,0),(2,0),...} θk

So FIXP strategy does not satisfy Property 2.

PK µ−λp k=0 1(θk=(0,0)B ) When φ , we know that lim →∞ = 0, so from (B.94) we ≤ λ(1−p) K K have:

U = 1, w.p.1 (B.96) UA

So FIXP strategy satisfies Property 2.

B.11 Verification of Theorem 5 under FIXP strategies in Realistic Proac-

tive system

Similar to the arguments we did for the Genie-Aided system in (B.40-B.42), we directly

R relates αPP with the limiting distribution of the embedded chain.

µ−λp When φ > λ(1−p) , we have:

∞ X αPP = f(m,0) (B.97) m=0

G by definition of the states in the Markov Chain. We rewrite αPP in the Genie-Aided system as:

 µ  αG (φ) = πG 1 + , w.p.1 (B.98) PP (0,0) λ + λ µ 2 3 −  µ  1 = 1 + λ +λ µ λ λ +λ λ µ µλ2 λ +λ λ2 + λ3 µ 2 3 + 1 2 1 3 2 3 − λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ3 (B.99)

206 APPENDIX B. APPENDIX OF CHAPTER 3

which is simpler to compare with. With the same φ, the αPP in Realistic Proactive system is:

 µ  αR (φ) = f 1 + A0 , w.p.1 (B.100) PP (0,0) λ + λ µ 2 3 −  µ  1 = 1 + 0 A  0  λ2 + λ3 µ A0 λ2+λ3 µ + 1 + A µλ2 µ Aµλ2 λ2+λ3 − λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ3 (B.101)

By comparing (B.101) with (B.99), we have:   0 µ µ + λ2+λ3 µ + µλ2 µ µλ2 λ2+λ3 ( 101) 1 + A µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 λ1λ2+λ1λ3−µλ2 λ2 B. λ2+λ3−µ − = µ   (B.99) 1 + µ 0 λ2+λ3 µ µλ2 µ µλ2 λ2+λ3 λ2+λ3−µ + A + µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ2 (B.102)

It is easy to verify of the numerator and denominator of (B.102) are both positive. The difference of the numerator and denominator can be transformed into the following form:

λ + λ µ   µ µλ  Difference = 1 A0 2 3 2 − λ − µ λ λ + λ µ − λ λ + λ λ µλ 3 − 1 2 3 − 1 2 1 3 − 2 (B.103) where 1 A0 > 0, λ2+λ3 µ < 0 and µ µλ2 > 0, so we know (B.101) < 1. − λ3 − µ−λ1 λ2+λ3−µ − λ1λ2+λ1λ3−µλ2 (B.99) R G µ−λp Therefore we have αPP < αPP when φ > λ(1−p) .

When φ µ−λp , the Realistic Proactive system can be analyzed in exactly the same way ≤ λ(1−p) as the Genie-Aided system. Proposition 7 is also true for Realistic Proactive system. So (B.48)-

R G (B.52) can be applied to Realistic Proactive system and we have αPP = αPP . Therefore we have

α µ−λp , with equality if and only if φ = µ−λp , which is the only strategy satisfying both PP ≤ µ(1−p) µ(1−p) properties.

207 APPENDIX B. APPENDIX OF CHAPTER 3

B.12 Proof of Theorem 10

First, the delay can be proved to be only dependent on the limiting distribution of the embedded chain in the Realistic Proactive system, similar to (B.64) in the Genie-Aided system.

Therefore, we directly use the conclusion without detailed proof here.

µ−λp When φ > λ(1−p) , we can calculate the limiting average delay of the Realistic Proactive system based on the limiting distribution in (3.59-3.58). To compare with delay in the Genie-Aided system with the same φ, we formulate the expressions in the following format which is easier to compare:

µλ1  µλ1 µλ2 λ2(λ2+λ3) µλ2 µ λ2(λ2+λ3)  2 + 2 2 + 2 (µ−λ ) (µ−λ ) λ1λ2+λ1λ3−µλ2 λ1λ2+λ1λ3−µλ2 λ2+λ3−µ 1 1 − λ3 λ3 DG =   µ + λ2+λ3 µ + λ1λ2+λ1λ3 µ µλ2 λ2+λ3 µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ3 (B.104)   µλ1 0 µλ1 µλ2 λ2(λ2+λ3) µλ2 µ λ2(λ2+λ3) 2 + A 2 2 + 2 (µ−λ ) (µ−λ ) λ1λ2+λ1λ3−µλ2 λ1λ2+λ1λ3−µλ2 λ2+λ3−µ 1 1 − λ3 λ3 DR =   µ + A0 λ2+λ3 µ + λ1λ2+λ1λ3 µ µλ2 λ2+λ3 µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ3 (B.105)

Similar to the way we compared αPP in both systems in (B.102), we consider:   µλ1 0 µλ1 µλ2 λ2(λ2+λ3) µλ2 µ λ2(λ2+λ3) 2 + A 2 2 + 2 ( 105) (µ−λ ) (µ−λ ) λ1λ2+λ1λ3−µλ2 λ1λ2+λ1λ3−µλ2 λ2+λ3−µ B. 1 1 − λ3 λ3 =   (B.104) µλ1 µλ1 µλ2 λ2(λ2+λ3) µλ2 µ λ2(λ2+λ3) 2 + 2 2 + 2 (µ−λ ) (µ−λ ) λ1λ2+λ1λ3−µλ2 λ1λ2+λ1λ3−µλ2 λ2+λ3−µ 1 1 − λ3 λ3   µ + λ2+λ3 µ + λ1λ2+λ1λ3 µ µλ2 λ2+λ3 µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ3   × µ + A0 λ2+λ3 µ + λ1λ2+λ1λ3 µ µλ2 λ2+λ3 µ−λ1 λ3 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 µ−λ1 − λ1λ2+λ1λ3−µλ2 λ3 (B.106)

We consider the difference of the numerator and denominator, and we are able to transform the

208 APPENDIX B. APPENDIX OF CHAPTER 3 difference into the following form:

µ(λ + λ )(λ λ + λ λ µλ )  µ µλ  = 1 0 2 3 1 2 1 3 2 2 Difference A 2 2 − − (µ λ ) λ λ2 + λ3 µ − λ1λ2 + λ1λ3 µλ2 − 1 3 − − (B.107)

0 µ(λ2+λ3)(λ1λ2+λ1λ3−µλ2) µ µλ2 Because we have 1 A > 0, 2 > 0, and > 0. − 2 λ2+λ3−µ λ1λ2+λ1λ3−µλ2 − (µ λ1) λ3 −

So we have DR > DG.

When φ µ−λp , the limiting distribution in the Realistic Proactive system has an all-zero ≤ λ(1−p) solution. We can apply the same analysis of the 1D Markov Chain due to the same reason for the

Genie-Aided system, and we omit the detailed derivations due to the similarity. By applying (B.70), we have DR = DG.

R λ−µ Based on Theorem 7, it is straightforward to see that the optimal delay is Dmin = λ(µ−λp) ,

µ−λp which is achieved at φ = λ(1−p) .

209