<<

arXiv:1707.02531v7 [cs.IT] 17 Aug 2019 vracanlta smdlda uu.Teestimator The queue. a as the modeled of is estimate estimato that an remote a channel reconstructs to a forwarded samples over with process, Wiener ern,Mdl atTcnclUiest,Akr,080T 06800 Ankara, University, Technical [email protected]). East Middle neering, hntoeo g-pia apig eowi apig an sampling, smaller zero-wait much sampling, be age-optimal can sampling. policy of periodic sampling those optimal than the estimation by the that achieved show r reveals comparisons and This information Our error. sampl of information. estimation age a of problem the between to age connection sampling equivalent the interesting above is minimizing error independe for the estimation problem are process, the times Wiener minimizing sampling for observed the the process allowe if of maximum threshol Wiener Further, the the and optimal rate. time much the sampling service how random find the by during and determined varies policy, is threshold threshold samp This optimal a the that is prove We strategy su constraint. error rate estimation sampling square a to mean the optimal minimizes the that study strategy We samples. received causally S emi:[email protected]). (e-mail: Ca Technology, USA of Institute Massachusetts Sciences, yzs0078@a puter (e-mail: USA 36849 AL Auburn, University, Auburn inrpoes uuigsystem. queueing process, Wiener rsetsml vial ttedsiainwsgenerated was destination the at available sample freshest rn ubr1725 hspprwspeetdi ata IEE at part in presented 849 was fellowship paper BIDEB [1]. This through 117E215. TUBITAK number by grant N CC supported agreement the grant was by under Uysal (CSoI), and Information CCF-17-17842, of No Science for Grant p under in Foundation supported was Science Polyanskiy Yury N00014-17-1-2417. grant n sdlvrdt h etnto ttime at destination the to delivered is and bu eoesuc.Spoeta the samples that status Suppose of source. timeliness remote the a measure informa about of to age the paramount introduced [3], [2], of was In is trading. stock movements for for pric interest-rate importance imperative stock and about is news, information financial vehicles For fresh motor and making. driving, of autonomous decision acceleration orientat location, and and the speed, estimation about knowledge state real-time s example, for system the critical about updates are timely grid, etc.), robotics, smart trading, networks, sensor control, airplane/vehicular sgnrtda h orea time at source the at generated is .Usli ihteDprmn fEetia n Electronic and Electrical of Department the with is Uysal E. .Sni ihteDprmn fEetia n optrEngi Computer and Electrical of Department the with is Sun Y. .Plasi swt h eateto lcrclEngineer Electrical of Department the with is Polyanskiy Y. i u a upre npr yNFgatCF1-35 n ON and CCF-18-13050 grant NSF by part in supported was Sun Yin Abstract nmn eltm oto n ye-hsclsses(e.g. systems cyber-physical and control real-time many In Terms Index i Sun, Yin siainoe hne ihRno Delay Random with Channel a over Estimation apigo h inrPoesfrRemote for Process Wiener the of Sampling I hsppr ecnie rbe fsmln a sampling of problem a consider we paper, this —In Smln,rmt siain g finformation, of age estimation, remote —Sampling, ebr IEEE Member, .I I. NTRODUCTION uyPolyanskiy, Yury , real-time S i ( 0 ≤ D i inlvlefrom value signal t ttssample status -th S i big,M 02139 MA mbridge, ttime At . 1 online -9330 Elif F-09-39370. ≤ r yNational by art re (e-mail: urkey n n Com- and ing S ST2017 ISIT E uburn.edu). 2 sampling eirMme,IEEE Member, Senior FCenter SF 4 and 044 ≤ Engi- s neering, emote t stock error bject tatus the , tion ion, . . . ling ing an nt at d. e, R d d r ) , once yacanlta rnmt iesapdsamples time-stamped transmits form that the of channel a by connected osuymr eea inlmdl.Arcn eutaogth along result recent fu A [18]. important models. in An signal reported analysis. general our more used study to were that properties nice eltm inlvalue signal real-time ievrigsignal time-varying a [3]–[17]. e.g., resources, network limited on ntesga au.Ti oiae st obyn h g of age the beyond investigate samples go to and us concept motivated information This value. signal the in siao,woega st rvd h best-guess the provide to is goal whose estimator, seFg ) nosre aigsmlsfo continuous- a from samples taking observer signal time An 2): Fig. (see plctos ubro ttsudt oiishv been have policies age the update keep status to real-t of developed of number growth rapid A the recei applications. of has because concept attention, information significant of age the Recently, pattern. yaial ae.Hne h iedfeec ewe the between age the difference by time described destination, more stock. the and vary a source and Hence, of time chart some later. price at slowly dynamically the change and may hurricane, signals a These of speed wind the ieryoe ieadi ee oasalrvleoc new increases a age once value age the smaller the 1, a Hence, to Fig. received. reset is in is sample and plotted time As over linearly destination. the time at current the age and small sample a received freshest the of antflycaatrz h muto change of amount the characterize fully cannot hc stetm ifrnebtentegnrto time generation the between difference time the is which ipythe simply time i.1 vlto fteaeo information of age the of Evolution 1: Fig. 1 npatc,tesaeo ayssesi ntefr of form the in is systems many of state the practice, In e scnie ttsudt ytmwt w terminals two with system update status a consider us Let hspprfcsso inrpoessga oe,wihha which model, signal process Wiener a on focuses paper This U ( ∆( . t S max = ) ∆( 0 t ) age W ∆( ( t n lfUysal, Elif and , = ) S D t t i safnto ftime of function a is , W , hc smdlda inrpoes n an and process, Wiener a as modeled is which ) 0 mle htteeeit rs ttssample status fresh a exists there that implies t { − S S S i i ) 1 U W W : codn oafis-n rtot(FIFO) first-out first-in, a to according ( D D t t t = ) uha h oaino vehicle, a of location the as such , 1 taltime all at i ∆( ≤ t t t − ) } S eirMme,IEEE Member, Senior The . max j ml,sbett constraints to subject small, − 1 ∆( t ieyudtso signal of updates timely D . { t 1 S g finformation of age t j hti endas defined is that h w emnl are terminals two The − ) i 1 : xiisasawtooth a exhibits S D ∆( j ∆( i ≤ t D t ) = ) uedrcinis direction ture sdrcinwas direction is W j t vrtime. over } W t , ˆ t t − t Hence, . t − o the for W some s U U U ime or , ved ( (1) ( t ( t t ) ) ) 1 , 2

ACK ! distribution as Yi. The optimal threshold is determined by fmax and WY , where WY is a random variable that sampler  has the same distribution as the amount of signal variation (Si, W S ) (Wt+Y Wt) that occurs during the random service time W i ˆ − t estimator  Wt Y . The random variable WY indicates a tight coupling, in observer ! channel ! the optimal sampling policy, between the source process Wt and the service time Y . Fig. 2: System model. • Our threshold-based optimal sampling policy has an important difference from the previous threshold-based sampling policies studied in, e.g., [21]–[38]: We have order, where Si is the sampling time of the i-th sample and proven that it is better to not take any new sample when W is the value of the i-th sample. The samples are stored in a Si the server is busy. Consequently, the threshold should be queue while they wait to be served by the channel. We assume disabled when the server is busy and reactivated once the that the samples experience i.i.d. random transmission times server becomes available again. This is one of the reasons over the channel, which may be caused by fading, interference, that sampling policies that ignore the state of the server, collisions, retransmissions, and etc. As such, the channel is such as periodic sampling, can have a large estimation modeled as a FIFO queue with i.i.d. service time Yi satisfying 2 error. E[Y ] < , where Yi 0 is the transmission time of i • We show, perhaps surprisingly, even in the absence of a sample i. This∞ queueing model≥ is helpful to understand the sampling rate constraint (i.e., fmax = ), the optimal robustness of remote estimation and control systems under sampling strategy is not zero-wait sampling∞ in which a occasionally slow service. For example, a UAV flying by new sample is generated once the previous sample is a WiFi access point may run into a communication outage delivered; rather, it is optimal to wait for a certain amount caused by interference from the access point. The resulting of time after the previous sample is delivered, and then delay in packet reception may affect the stability of UAV flight take the next sample. control and navigation [19]. • Our study reveals a relationship between the age of Let Gi be the service starting time of sample i such that information and the estimation error of Wiener pro- Si Gi. The delivery time of sample i is Di = Gi + Yi. The ≤ cess: If the sampling times Si are independent of the initial value W0 =0 is known by the estimator for free, which observed Wiener process (i.e., the sampling times Si is represented by S0 = D0 =0. At any time t, the estimator are chosen without using any information about the ˆ forms an estimate Wt using the samples received up to time Wiener process), the MSE in (2) is exactly equal to t. Similar with [20], we assume that the estimator neglects the time-average expectation of the age of information the implied knowledge when no sample was delivered. The T lim sup 1 E[ ∆(t)dt]. Hence, the sampling prob- quality of remote estimation is evaluated via the time-average T →∞ T 0 lem for minimizing the MSE is equivalent to a sam- mean-square error (MSE) between and ˆ : Wt Wt pling problem forR minimizing the age, where the second T problem was solved recently in [9]–[11]. If the sampling mse 1 E ˆ 2 = lim sup (Wt Wt) dt . (2) times Si are chosen based on causal knowledge of the T →∞ T − "Z0 # Wiener process, the age-optimal sampling policy (i.e., The sampler is subject to a sampling rate constraint the sampling policy that minimizes the time-average ex- pected age of information) no longer minimizes the MSE: 1 E 1 lim inf [Sn] , (3) Specifically, in the age-optimal sampling policy, a new n→∞ n ≥ fmax sample is taken only when the age of information ∆(t), or where fmax is the maximum allowed sampling rate. In prac- 2 equivalently the expected estimation error E[(Wt Wˆ t) ], tice, the sampling rate constraint (3) is imposed when there is is no smaller than a threshold; while in the MSE-optimal− a need to reduce the cost (e.g., energy consumption) for the sampling policy, a new sample is taken only when the transmission, storage, and processing of the samples. instantaneous estimation error Wt Wˆ t is no smaller Our goal is to find an optimal online sampling strategy that than a threshold. The asymptotics| of− the| MSE-optimal minimizes the MSE in (2) by choosing the sampling times and age-optimal sampling policies at long/short service Si causally subject to the sampling rate constraint (3). The time or low/high sampling rates are also studied. contributions of this paper are summarized as follows: • Our theoretical and numerical comparisons show that the • We formulate the optimal sampling problem as a con- MSE of the optimal sampling policy can be much smaller strained continuous-time Markov decision problem with than those of age-optimal sampling, periodic sampling, a continuous state space, and solve it exactly. We prove and the zero-wait sampling policy described in (9) be- that the optimal online sampling strategy for the Wiener low. In particular, periodic sampling is far from optimal process is a threshold policy2, and find the optimal when the sampling rate is sufficiently low or sufficiently threshold. Let Y be a random variable with the same high; age-optimal sampling is far from optimal when the sampling rate is sufficiently low; periodic sampling, age- 2 A sampling policy is said to be a threshold policy if a new sample is taken when a threshold condition is satisfied. Examples of threshold policies optimal sampling, and zero-wait sampling policies are all can be found in Section III-B. 3

far from optimal if the service time distribution is heavy- before transmission. In [30], [31], it was shown that, for tailed. a class of symmetric probability distributions on the source The rest of this paper is organized as follows. In Section symbol Xt, if the transmission scheduling policy is threshold- based, i.e., a new coded packet is sent if Xt is no smaller II, we discuss some related work. In Section III, we describe | | the system model and the formulation of the optimal sampling than a threshold, then the optimal encoder and decoder are problem. In Section IV, we present the solution to this problem piecewise affine. In [32], it was shown that if (i) the encoder and compare it with some other sampling policies. In Section and decoder (i.e., estimator) are piecewise affine, and (ii) the V, we describe the proof of this optimal solution. Some transmission scheduler satisfies some technical assumption, simulation results are provided in Section VI. the optimal transmission scheduling policy is threshold-based. Some extensions of this research were reported in [33]–[35]. In [36]–[38], Chakravorty and Mahajan considered optimal II. RELATED WORK transmission scheduling and remote estimation over a few Lossy source coding and the rate-distortion function of the channel models, where it was proved that a threshold-based Wiener process was studied in, e.g., [39], [40], where the rate- transmission policy and a Kalman-like estimator are jointly distortion function represents the optimal tradeoff between optimal for minimizing the remote estimation error. the source coding rate and the distortion (i.e., MSE) for The closest study to this paper are [28], [29], where the recovering of the Wiener process. The goal of these studies is optimal sampler of the Wiener process was designed in the to reconstruct the realization of Wiener process during a past absence of queueing and random service time (i.e., Yi = 0). time interval with a small distortion, which can be regarded As we will see later, the queueing model affects the structure as an offline signal reconstruction problem. This differs from of the optimal sampler. Specifically, the sampler should disable our online signal tracking problem, where the real-time value the threshold when there is a packet in service and reactivate of the Wiener process is estimated at the destination from the threshold after all previous packets are delivered. A novel causally received samples. proof procedure is developed in the current paper to find the This paper is related to recent studies on the age of optimal sampler design. information, e.g., [3]–[17]. As mentioned above in Section I, a connection between the age of information and the estimation III. SYSTEM MODEL AND PROBLEM FORMULATION error of Wiener process is characterized in this paper. The A. MMSE Estimation Policy estimation error of Wiener process was also mentioned in [4], At time t, the information available to the estimator contains [5] as an illustration of the age of information, where age- two part: (i) Mt = (Si, WS ,Di): Di t , which contains based sampling was not studied and the condition that the { i ≤ } the sampling time Si, sample value WS , and delivery time sampling times are independent of the Wiener process was i Di of the samples delivered by time t and (ii) the facts that used implicitly. Recently, a relationship between a nonlinear no sample has been received after the latest sample delivery function of the age of information and the estimation error of time max Di : Di t . Similar with [20], we assume that the Ornstein-Uhlenbeck (OU) process was found in a follow- the estimator{ neglects≤ the} implied knowledge when no sample up study of the current paper [18]. was delivered. In this case, the minimum mean-square error This paper can also be considered as a contribution to (MMSE) estimation policy [41] is given by (see Appendix A the rich literature on remote estimation, e.g., [21]–[38], by for its derivation) including a queueing model. In [21], Astr¨om˚ and Bernhardsson ˆ showed that a threshold-based sampling method, in which a Wt =E[Wt Mt] | new sample is taken once the amount of signal variation since =WSi , if t [Di,Di+1), i =0, 1, 2,..., (4) the previous sample has reached a threshold, can achieve a ∈ smaller estimation error than the traditional periodic sampling which is illustrated in Fig. 3(b). method with the same sampling rate. Such a threshold-based sampler and a Kalman-like estimator have been proven to be B. Sampling Policies jointly optimal for minimizing the remote estimation error Let It 0, 1 denote the idle/busy state of the server at ∈ { } of several discrete-time signal processes in [22]–[27]. The time t. As shown in Fig. 2, the server state It is known by the sampling and remote estimation of continuous-time signal pro- sampler through acknowledgements (ACKs). We assume that cesses were considered in [28], [29], where it was shown that once a sample is delivered to the estimator, an ACK is fed a threshold-based sampling policy is optimal for minimizing back to the sampler with zero delay. Hence, the information the estimation error of the Wiener process, and the optimal that is available to the sampler at time t can be expressed as threshold was found. In [22]–[29], it was assumed that the Ws, Is :0 s t . { ≤ ≤ } samples are transmitted from the sampler to the estimator In online sampling policies, each sampling time Si is chosen over a perfect channel that is error and noise free. There causally using the information available at the sampler. To are some recent studies that used explicit channel models. characterize this statement precisely, we define In [30]–[32], Gao et. al. considered the optimal transmission + t = σ(Ws, Is :0 s t), = s>t s, (5) scheduling and remote estimation of an i.i.d. discrete-time N ≤ ≤ Nt ∩ N source process Xt over an additive noise channel. Because where σ(X1,X2,...,Xn) represents the σ-field generated by + of the noise, the transmitter needs to encode its message the random variables X ,X ,...,Xn. Then, ,t 0 is 1 2 {Nt ≥ } 4

− W − W Wt WSi t Si Wt

β β t t t · · · t 0) (Si, 0) Si ++1Yi =Si+1 (Si, 0) S0)i +Yi Si+1 +1 0 S1 S2 S3 S4 S5 · · · Si Si+1 Si+2 −β −β (a) Wiener process Wt and its samples. ˆ Wt (a) If WSi+Yi WSi √β, (b) If WSi+Yi WSi < √β, sample| i + 1 is− taken| at≥ time sample|i + 1 is− taken at| time t Si+1 = Si + Yi. that satisfies t Si + Yi and ≥ · · · Wt WS = √β. t | − i |

0 D1 D2D3 D4D5 · · · DiDi+1 Di+2 Fig. 4: Illustration of the threshold-based sampling policy (12), where no sample is taken during [Si,Si + Yi). (b) Estimate process Wˆ t using causally received samples. Fig. 3: Illustration of the MMSE estimation policy (4). 3. Threshold policy on expected estimation error [6], [9], [10]: The sampling times are given by a filtration (i.e., a non-decreasing and right-continuous family Si = inf t Si + Yi : t Si β (10) of σ-fields) of the information available at the sampler. Each +1 { ≥ − ≥ } 2 sampling time Si is a with respect to the = inf t Si + Yi : E[(Wt Wˆ t) ] β , (11) + ≥ − ≥ filtration t ,t 0 , i.e., {N ≥ } n ˆ o + where Si + Yi = Di and, according to (4), Wt = WSi . Si t , t 0. (6) { ≤ } ∈ Nt ∀ ≥ 4. Threshold policy on instantaneous estimation error: The Let π = (S1,S2,...) denote a sampling policy where sampling times are given by S1 S2 form an increasing sequence of sampling ≤ ≤ · · · Si = inf t Si + Yi : Wt Wˆ t β , (12) times. Let Π denote a set of online (also called causal) +1 ≥ | − |≥ sampling policies satisfying the following two conditions: (i) n o where Wˆ = W . The sampling policyp in (12) can Each sampling policy π Π satisfies (6) for all i = 0, 1,... t Si ∈ be understood as follows: As illustrated in Fig. 4, if (ii) The inter-sampling times Ti = Si+1 Si,i = 0, 1,... { − } WS Y WS √β, sample i +1 is generated at form a [42, Section 6.1]: There exist an | i+ i − i | ≥ the time Si+1 = Si + Yi when sample i is delivered; increasing sequence 0 k1 < k2 < . . . of almost surely finite ≤ otherwise, if WSi+Yi WSi < √β, sample i +1 is random integers such that the post-kj process Tkj +i,i = | − | { generated at the earliest time t such that t Si + Yi and 0, 1,... has the same distribution as the post-k1 process ≥ } Wt WSi reaches the threshold √β. It is worthwhile to Tk1+i,i =0, 1,... and is independent of the pre-kj process | − | { } E emphasize that even if there exists time t [Si,Si + Yi) Ti,i = 0, 1,...,kj 1 ; in addition, [kj+1 kj ] < , ∈ E{ 2 −E } 2 − ∞ such that Wt WSi √β, no sample is taken at such [S ] < , and 0 < [(Skj+1 Skj ) ] < for j =1, 2,... | − |≥ k1 time t, as depicted in both cases of Fig. 4. In other words, By Condition∞ (ii), we can obtain− that, almost∞ surely, the threshold-based control is disabled during [Si,Si+Yi) lim Si = , lim Di = . (7) and is reactivated at time Si + Yi. i→∞ ∞ i→∞ ∞ A sampling policy π Π is said to be signal-ignorant We analyze the MSE in (2), but operationally a nicer cri- ∈ D (signal-aware), if π is (not) independent of the Wiener process terion is lim sup E[ n (W Wˆ )2dt]/E[D ]. These n→∞ 0 t t n W ,t 0 . The sampling policies (8), (9), and (11) are two criteria are associated to two− definitions of “average t signal-ignorant,{ ≥ } and the sampling policy (12) is signal-aware. cost per unit time” used inR the literature of infinite-horizon undiscounted semi-Markov decision problems [43]–[47]. They are equivalent, if T1,T2,... is a regenerative process, or C. Optimal Sampling Problem { } more generally, if T1,T2,... has only one ergodic class { } We assume that the source process Wt,t 0 and the [43]–[45]. If no condition is imposed, however, these two { ≥ } service times Yi,i =1, 2,... are mutually independent and criteria are different. { } Some examples of the sampling policies in are: do not change according to the sampling policy. In addition, Π 2 we assume that the Yi’s are i.i.d. with E[Y ] < . The 1. Periodic sampling [48], [49]: The inter-sampling times i ∞ are constant, such that for some β 0, optimal sampling problem for minimizing the MSE subject ≥ to a sampling rate constraint is formulated as (8) Si+1 = Si + β. T mse 1 E ˆ 2 2. Zero-wait sampling [3], [6], [9], [10]: A new sample is opt , inf lim sup (Wt Wt) dt (13) π∈Π T →∞ T − generated once the previous sample is delivered, i.e., "Z0 # 1 1 s.t. lim inf E[Sn] , (14) Si+1 = Si + Yi. (9) n→∞ n ≥ fmax 5

where mseopt denotes the optimal value of (13). Later on in Wt,t 0 . For each π Πsignal-ignorant, the objective function the paper, the unconstrained problem with f = will also {in (13)≥ can} be rewritten∈ as (see Appendix B for the proof) max ∞ be studied. T 1 2 lim sup E (Wt Wˆ t) dt T − IV. OPTIMAL SAMPLING POLICIES T →∞ "Z0 # T A. Signal-aware Sampling 1 = lim sup E ∆(t)dt , (18) T Problem (13) is a constrained continuous-time Markov T →∞ "Z0 # decision problem with a continuous state space. Such problems where ∆(t) is the age of information defined in (1). In FIFO are often lack of closed-form or analytical solutions, however queueing systems, Di Di+1 holds for all i. Hence, the age we were able to solve (13) exactly: ∆(t) can be equivalently≤ expressed as E 2 Theorem 1. If the service times Yi’s are i.i.d. with [Yi ] < ∆(t)= t Si, t [Di,Di+1), i =0, 1, 2,... (19) , then there exists β 0 such that the sampling policy (12) − ∈ ∞is an optimal solution of≥(13), and the optimal β is determined If the set of feasible policies is restricted from Π to 3 by solving Πsignal-ignorant, (13) reduces to the following sampling problem E 2 4 for minimizing the time-average expectation of the age of E 2 1 [max(β , WY )] [max(β, WY )]=max , , (15) information [9]–[11]: fmax 2β   T where is a random variable with the same distribution as mse 1 E Y age-opt , inf lim sup ∆(t)dt (20) π∈Πsignal-ignorant T Yi. The optimal value of (13) is then given by T →∞ "Z0 # 1 1 E[max(β2, W 4 )] s.t. E mse = Y + E[Y ]. (16) lim inf [Sn] , opt E 2 n→∞ n ≥ fmax 6 [max(β, WY )] where mse denotes the optimal value of (20). Because Proof. See Section V. age-opt Πsignal-ignorant Π, ⊂ According to Theorem 1, in the optimal signal-aware sam- mseopt mseage-opt. (21) pling policy, the (i + 1)-th sample is taken at the earliest time ≤ t satisfying two conditions: (i) The i-th sample has already Note that problem (20) is simpler than (13) because the been delivered by time t, i.e., t Di = Si + Yi, and (ii) sampler does not use knowledge of W to make decisions. ≥ t the instantaneous estimation error Wt Wˆ t at time t is no To solve (13), stronger techniques than those in [9]–[11] are smaller than a threshold √β. In addition,| − the| threshold √β developed in Section V. is determined by the maximum allowed sampling rate f max Theorem 2. [11] If the service times Y ’s are i.i.d. with and W , where W is a random variable that has the same i Y Y E[Y 2] < , then there exists β 0 such that the sampling distribution with the amount of signal variation (W W ) i t+Y t policy (11)∞is an optimal solution≥ of (20), and the optimal β during the random service time Y for all starting time t−. This is determined by solving indicates a tight coupling between the source process Wt and the service time Y , in the optimal sampling policy. 1 E[max(β2, Y 2)] E[max(β, Y )]=max , , (22) Equation (15) can be solved by using the bisection method f 2β  max  with a low computational complexity. Hence, Problem (13) where Y is a random variable with the same distribution as does not suffer from the curse of dimensionality encountered Yi. The optimal value of (20) is then given by in most Markov decision problems with continuous state E 2 2 spaces. We note that the sampling policy in (12) and (15) is mse [max(β , Y )] E age-opt , + [Y ]. (23) quite general in the sense that it is optimal for any service time 2E[max(β, Y )] E 2 distribution satisfying [Y ] < . The optimal signal-aware Theorem 2 was proven in [9], [10] under an extra condition sampling policy in (12) and (15)∞ is also called the “MSE- that the time difference Si+1 Di is upper bounded by a optimal” sampling policy in the sequel. constant M > 0. In [11], Theorem− 2 was established without requiring this extra condition. B. Signal-ignorant Sampling and the Age of Information One can obtain some interesting observations by comparing Theorem 1 and Theorem 2: In the optimal signal-ignorant Let Π Π denote the set of signal-ignorant signal-ignorant sampling policy presented in Theorem 2, the (i+1)-th sample sampling policies, defined⊂ as is taken at the earliest time t satisfying two conditions: (i) Πsignal-ignorant = π Π: π is independent of Wt,t 0 . The i-th sample has already been delivered by time t, i.e., { ∈ { ≥ }} (17) t Di = Si + Yi, and (ii) the expected estimation error E[(≥W Wˆ )]2 at time t, which, by (10) and (19), is equal In these policies, the sampling decisions depend only on the t t to the age− ∆(t), is no smaller than a threshold β. The first service time Yi,i = 1, 2,... but not the source process { } condition is the same with that in Theorem 1, but the second 3If β → 0, the last terms in (15) and (22) are determined by L’Hospital’s condition is quite different: Because the sampler has no knowl- rule. edge about the Wiener process (except for its distribution), 6 it can only use expected estimation error to make decisions. achieves the maximum throughput and the minimum queueing Further, the threshold β in Theorem 2 is determined by the delay. However, this zero-wait sampling policy almost never maximum allowed sampling rate fmax and the random service minimizes the MSE in (13) and does not always minimize time Y , which is also different from the case in Theorem 1. the age of information in (20), as stated in the following two The optimal signal-ignorant sampling policy in (11) and (22) theorems: is also referred to as the “age-optimal” sampling policy. We note that the zero-wait sampling policy can be expressed In the following, the asymptotics of the MSE-optimal and as (12) with β = 0. By checking when β = 0 is satisfied in age-optimal sampling policies at low/high service time or (12) and (15), one can obtain low/high sampling frequencies are studied. Theorem 3. Suppose that fmax = . Then, the zero-wait sampling policy (9) is an optimal solution∞ to (13) if and only C. Short Service Time or Low Sampling Rate if the service time Y is equal to zero with probability one. Let Proof. See Appendix E. Yi = αXi (24) Hence, as long as the service time Y has a small probability represent the scaling of the service time Yi with α, where to be positive, the zero-waiting sampling policy is not an α 0 and the Xi’s are i.i.d. positive random variables. If ≥ optimal solution to (13). Similarly, the optimality of zero-wait α 0 or fmax 0, we can obtain from (15) that (see sampling policy for solving (20) is characterized as Appendix→ C for the→ proof) Theorem 4. [10] Suppose that f = . Then, the zero- 1 1 max ∞ β = + o , (25) wait sampling policy (9) is an optimal solution to (20) if and f f max  max  only if where f(x) = o(g(x)) as x a means that limx a → → E[Y 2] 2 ess inf Y E[Y ], (31) f(x)/g(x)=0. In this case, the MSE-optimal sampling policy ≤ in (12) and (15) becomes where ess inf Y = sup y [0, ) : Pr[Y

If α or fmax , as shown in Appendix D, the MSE-optimal→ ∞ sampling policy→ ∞ for solving (13) is given by (12) A. Simplification of Problem (13) where β is determined by solving The following lemma is useful for simplifying (13). E 2 E 2 4 2β [max(β, WY )] = [max(β , WY )]. (29) Lemma 1. Suppose that π is a feasible policy for Problem (13), in which at least one sample is taken when the server Similarly, if α or f , the age-optimal sampling max is busy processing an earlier generated sample. Then, there policy for solving→ ∞ (20) is given→ by ∞ (11) where β is determined exists another feasible policy for Problem (13) that has a by solving π1 smaller estimation error than policy π. Hence, it is suboptimal 2βE[max(β, Y )] = E[max(β2, Y 2)]. (30) in Problem (13) to take a new sample before the previous sample is delivered. In these limits, the ratio between mseopt and mseage-opt depends on the distribution of Y . Proof. Lemma 1 is proven by using the strong Markov prop- When the sampling rate is unbounded, i.e., fmax = , erty of the Wiener process and the orthogonality principle one logically reasonable policy is the zero-wait sampling∞ of MMSE estimation. The details are provided in Appendix policy in (9) [3], [6], [9], [10]. This zero-wait sampling policy F. 7

By Lemma 1, we only need to consider a sub-class of the following Markov decision problem: sampling policies Π Π such that each sample is generated 1 n−1 Di+1 2 ⊂ E (Wt WS ) dt and submitted to the server after the previous sample is mse i=0 Di − i opt , inf lim n−1 (35) delivered, i.e., π∈Π1 n→∞ h E i P iR=0 [Yi + Zi] n−1 Π1 = π Π: Si Di−1 for all i . (32) 1 P 1 { ∈ ≥ } s.t. lim E [Yi + Zi] , (36) n→∞ n ≥ f This completely eliminates the waiting time wasted in the i=0 max queue, and hence the queue is always kept empty. The in- X where mseopt is the optimal value of (35). formation that is available for determining includes the Si In order to solve (35), let us consider the following Markov history of signal values (Wt,t [0,Si]) and the service ∈ 4 decision problem with a parameter c 0: times (Y1,...,Yi−1) of previous samples. To characterize this ≥ n−1 Di+1 statement precisely, let us define the σ-fields t = σ(Ws : 1 2 + + E F p(c), inf lim (Wt WSi ) dt c(Yi +Zi) s [0,t]) and t = s>t s. Then, t ,t 0 is the π∈Π1 n→∞ n " Di − − # filtration∈ (i.e., a non-decreasingF ∩ F and right-continuous{F ≥ fa}mily of i=0 Z X (37) σ-fields) of the Wiener process Wt. Given the service times n−1 (Y1,...,Yi−1) of previous samples, Si is a stopping time with 1 E 1 + s.t. lim [Yi + Zi] , respect to the filtration ,t 0 of the Wiener process Wt, n→∞ n ≥ f t i=0 max that is {F ≥ } X where p(c) is the optimum value of (37). Similar with Dinkel- + [ Si t Y1,...,Yi−1] t , t 0. (33) bach’s method [52] for nonlinear fractional programming, we { ≤ }| ∈F ∀ ≥ can obtain the following lemma for our Markov decision Then, the policy space Π can be alternatively expressed as 1 problem: + Π = Si :[ Si t Y ,...,Yi ] , t 0, 1 { { ≤ }| 1 −1 ∈Ft ∀ ≥ Lemma 2. The following assertions are true: Si Di−1 for all i, ≥ mse Ti = Si Si is a regenerative process . (34) (a). opt T c if and only if p(c) T 0. +1 − } (b). If p(c)=0, the solutions to (35) and (37) are identical. Recall that any policy in Π satisfies “Ti = Si Si is a +1 − regenerative process”. Proof. See Appendix G.

Let Zi = Si Di 0 represent the waiting time between +1 − ≥ Hence, the solution to (35) can be obtained by solving (37) the delivery time Di of sample i and the generation time Si+1 mse i−1 and seeking a opt 0 such that of sample i +1. Then, Si = Z0 + j=1(Yj + Zj ) and Di = ≥ i−1 mse (Z + Y ). If (Y , Y ,...) is given, (S ,S ,...) is p( opt)=0. (38) j=0 j j+1 1 2 P 0 1 uniquely determined by (Z0,Z1,...). Hence, one can also use P π = (Z0,Z1,...) to represent a sampling policy. B. Lagrangian Dual Problem of (37) when c = mseopt Because Ti is a regenerative process, using the renewal Although (37) is a continuous-time Markov decision prob- theory [51] and [42, Section 6.1], one can show that in Problem lem with a continuous state space, rather than a convex 1 E (13), n [Sn] is a convergent sequence and optimization problem, it is possible to use the Lagrangian dual T approach to solve (37) and show that it admits no duality gap. 1 2 lim sup E (Wt Wˆ t) dt When c = mseopt, define the following Lagrangian T − T →∞ "Z0 # L(π; λ) E Dn ˆ 2 0 (Wt Wt) dt n−1 − 1 Di+1 = lim E E 2 mse n→∞ h [Dn] i = lim (Wt WSi ) dt ( opt + λ)(Yi +Zi) R n→∞ n " Di − − # n−1 Di+1 2 i=0 Z E (Wt WS ) dt X i=0 Di i λ = lim − , + , (39) n→∞ hn−1 E i P Ri=0 [Yi + Zi] fmax E E n−1 where λ 0 is the dual variable. Let where in the last step weP have used [Dn]= [ i=0 (Zi + ≥ E n−1 . Hence, (13) can be rewritten as Yi+1)] = [ i=0 (Yi + Zi)] e(λ) , inf L(π; λ). (40) P π∈Π1 P Then, the Lagrangian dual problem of (37) is defined by

d , max e(λ), (41) λ≥0 where d is the optimum value of (41). Weak duality [53], [54] 4 implies that d p(mseopt). In Section V-D, we will establish Note that the generation times (S1,...,Si−1) of previous samples are ≤ also included in this information. strong duality, i.e., d = p(mseopt). 8

X+ X X+ In the sequel, we solve (40). Using the stopping times and and t = v>t v , as well as the filtration t ,t 0 F ∩ F {F2 ≥ } martingale theory of the Wiener process, we can obtain the of the Xt. A random variable τ : R [0, ) →X+ ∞ following lemma: is said to be a stopping time of Xt if τ t t for all t 0. Let M be the set of square-integrable{ ≤ stopping}∈F times Lemma 3. Let τ 0 be a stopping time of the Wiener process ≥ 2 ≥ of Xt, i.e., Wt with E[τ ] < , then ∞ M X+ E 2 τ = τ 0 : τ t t , τ < . E 2 1E 4 { ≥ { ≤ }∈F ∞} Wt dt = Wτ . (42) 6 Our goal is to solve the following optimal stopping problem: Z0  Proof. See Appendix H.   sup Exg(Xτ ), (47) τ∈M By using Lemma 3 and the sufficient of (40), we can show that for every i =1, 2,..., where X0 = x is the initial state of the Markov chain Xt, the function g : R2 R is defined as Di+1 → 2 E (Wt WS ) dt 1 4 − i g(s,b)= βs b (48) "ZDi # − 2 1E 4 E E with parameter β 0. Notice that (44) is a special case of (47) = (WSi+Yi+Zi WSi ) + [Yi + Zi] [Yi] , (43) ≥ 6 − where the initial state is x = (Yi, WS Y WS ), and Wt is i+ i − i which is proven in Appendix I.  replaced by the time-shifted Wiener process WSi+Yi+t WSi . s − For any s 0, define the σ-fields t = σ(Ws+v R2 ≥ s+ Fs − Theorem 6. For all x = (s,b) and β 0, an optimal Ws : v [0,t]) and t = v>t v , as well as the ∈ ≥ ∈ s+ F ∩ F stopping time for solving (47) is filtration t ,t 0 of the time-shifted Wiener process {F ≥ } M ∗ Ws+t Ws,t [0, ) . Define s as the set of square- (49) { − ∈ ∞ } τ = inf t 0 : b + Wt β . integrable stopping times of Ws+t Ws,t [0, ) , i.e., ≥ | |≥ { − ∈ ∞ } In order to prove Theoremn 6, let us definep o the function s+ 2 Ms = τ 0 : τ t , E τ < . { ≥ { ≤ }∈Ft ∞} u(x)= Exg(Xτ ∗ ) (50) By substituting (43) into (40) and using again the sufficient statistics of (40), we can obtain and establish some properties of u(x). Theorem 5. An optimal solution (Z ,Z ,...) to (40) satisfies Lemma 4. u(x) g(x) for all x R2, and 0 1 ≥ ∈ 1 1 4 if 2 E 4 βs 2 b , b β; Zi = arg inf (WSi+Yi+τ WSi ) u(s,b)= − ≥ (51) τ∈MS +Y 2 − 1 2 2 2 i i ( βs + β βb , if b < β.  2 − β(Yi + τ) WS Y WS , Yi , (44) Proof. See Appendix K. − i+ i − i  where β is given by A function f(x) is said to be excessive for the process Xt if [50] β = 3(mseopt + λ E [Y ]) 0. (45) 2 − ≥ Exf(Xt) f(x), for all t 0, x R . (52) ≤ ≥ ∈ By using the Itˆo-Tanaka-Meyer formula [55, Theorem 7.14 Proof. See Appendix J. and Corollary 7.35] in calculus, we can obtain

Note that because the Yi’s are i.i.d. and the strong Markov Lemma 5. The function u(x) is excessive for the process Xt. property of the Wiener process, the Zi’s as solutions of (44) are also i.i.d. Proof. See Appendix L. Now, we are ready to prove Theorem 6. C. Per-Sample Optimal Stopping Solution to (44) We use optimal stopping theory [50] to solve (44). Let Proof of Theorem 6. In Lemma 4 and Lemma 5, we have E ∗ us first pose (44) in the language of optimal stopping. A shown that u(x) = xg(Xτ ) is an excessive function and P ∗ continuous-time two-dimensional Markov chain X on a prob- u(x) g(x). In addition, it is known that x(τ < )=1 t ≥ R2 ∞ ability space (R2, , P) is defined as follows: Given the initial for all x [56, Theorem 8.5.3]. These conditions, together F with the∈ Corollary to Theorem 1 in [50, Section 3.3.1], imply state X0 = x = (s,b), the state Xt at time t is that τ ∗ is an optimal stopping time of (47). This completes Xt = (s + t,b + Wt), (46) the proof. where Wt,t 0 is a standard Wiener process. Define { ≥ } A consequence of Theorem 6 is Px(A) = P(A X0 = x) and ExZ = E(Z X0 = x), respectively, as the| of event| A and the Corollary 1. An optimal solution to (44) is conditional expectation of random variable Z for given initial X Zi = inf t 0 : WSi+Yi+t WSi β . (53) state X0 = x. Define the σ-fields t = σ(Xv : v [0,t]) ≥ | − |≥ F ∈ n p o 9

In addition, this solution satisfies 55 50 periodic sampling E E 2 45 [Yi + Zi]= [max(β, WY )], (54) zero-wait sampling 4 2 4 40 age-optimal sampling E[(WS Y Z WS ) ]= E[max(β , W )]. (55) i+ i+ i − i Y 35 MSE-optimal sampling Proof. See Appendix M. 30

MSE 25 20 D. Zero Duality Gap between (37) and (41) 15 10 Strong duality is established in the following theorem: 5 0 Theorem 7. If c = mseopt, the following assertions are true: 0 0.5 1 1.5 f max (a). The duality gap between (37) and (41) is zero, i.e., d = Fig. 5: MSE vs. fmax tradeoff for i.i.d. exponential service p(mseopt). time and E[Yi]=1, where zero-wait sampling is not feasible (b). A common optimal solution to (13), (35), and (37) is when fmax < 1/E[Yi] and hence is not plotted in that regime. given by (12) and (15). The optimal value of (13) is

given by (16). 100 Proof Sketch of Theorem 7. 90 periodic sampling We use [53, Prop. 6.2.5] to find age-optimal sampling a geometric multiplier [53, Definition 6.1.1] for Problem 80 MSE-optimal sampling (37). This tells us that the duality gap between (37) and 70 (41) must be zero, because otherwise there is no geometric 60 5 50

multiplier [53, Prop. 6.2.3(b)]. This result holds not only MSE for convex optimization problem, but also for general non- 40 convex optimization and Markov decision problems like (37). 30 See Appendix N for the details. 20 10 Hence, Theorem 1 follows from Theorem 7. 0 0 0.5 1 1.5 2 2.5 3 3.5 σ VI. NUMERICAL RESULTS Fig. 6: MSE vs. the scale parameter σ of i.i.d. log-normal E In this section, we evaluate the estimation performance service time for fmax = 0.8 and [Yi]=1, where zero-wait E achieved by the following four sampling policies: sampling is not feasible because fmax < 1/ [Yi].

1. Periodic sampling: The policy in (8) with β = fmax. 2. Zero-wait sampling [3], [6], [9], [10]: The sampling periodic sampling, and hence mseage-opt and mseperiodic are of policy in (9), which is feasible when fmax 1/E[Yi]. 3. Age-optimal sampling [9], [10]: The sampling≥ policy in similar values. However, as fmax approaches the maximum mse (11) and (22), which is the optimal solution to (20). throughput 1, periodic blows up to infinity. This is because 4. MSE-optimal sampling: The sampling policy in (12) and the queue length in periodic sampling is large at high sampling (15), which is the optimal solution to (13). frequencies, and the samples become stale during their long waiting times in the queue. On the other hand, mse and Let mse , mse , mse , and mse , be the opt periodic zero-wait age-opt opt mse decrease with respect to f . The reason is that the MSEs of periodic sampling, zero-wait sampling, age-optimal age-opt max set of feasible policies satisfying the constraint in (13) and (20) sampling, MSE-optimal sampling, respectively. According to becomes larger as f grows, and hence the optimal values (21), as well as the facts that periodic sampling is feasible max of (13) and (20) are decreasing in f . Moreover, the gap for (20) and zero-wait sampling is feasible for (20) when max between mseopt and mseage-opt is large for small values of fmax. fmax 1/E[Yi], we can obtain ≥ The ratio mseopt/mseage-opt tends to 1/3 as fmax 0, which → mseopt mseage-opt mseperiodic, is in accordance with (28). As we expected, msezero-wait is ≤ ≤ larger than mse and mse when f 1. In summary, mse mse mse 1 opt age-opt max opt age-opt zero-wait, when fmax , periodic sampling is far from optimal if the≥ sampling rate is ≤ ≤ ≥ E[Yi] too low or sufficiently high; age-optimal sampling is far from which fit with our numerical results below. optimal if the sampling rate is too low. Figure 5 depicts the tradeoff between MSE and f for max Figure 6 and Figure 7 illustrate the MSE of i.i.d. log-normal i.i.d. exponential service time with mean E[Y ]=1/µ = 1. i service time for f = 0.8 and f = 1.5, respectively, Hence, the maximum throughput of the queue is µ = 1. In max max where Y = eσXi /E[eσXi ], σ > 0 is the scale parameter of this setting, mse is characterized by eq. (25) of [3], i periodic log-, and (X ,X ,...) are i.i.d. Gaussian which was obtained using a D/M/1 queueing model [57]. For 1 2 random variables with zero mean and unit . Because small values of fmax, age-optimal sampling is similar with E[Yi]=1, the maximum throughput of the queue is 1. In 5Note that geometric multiplier is different from the traditional Lagrangian Fig. 6, since fmax < 1, zero-wait sampling is not feasible multiplier. and hence is not plotted. As the scale parameter σ grows, 10

100 which is defined as 90 zero-wait sampling D ∧T age-optimal sampling i+1 80 ˆ E ˆ 2 MMSE-optimal sampling h(Wt)= (Wt Wt) dt (Sj , WSj ,Dj )j≤i 70 ( Di∧T − ) Z 60 (56)

50 MSE for any T > 0. By using Lemma 4 in [10], it is not hard to 40 show that h(Wˆ ) is a convex functional of the estimate Wˆ . 30 t t 20 In the sequent, we will find the optimal estimate that solves 10 min h(Wˆ t). (57) 0 Wˆ t 0 0.5 1 1.5 2 2.5 3 3.5 σ Let ft and gt be two estimates, which are functions of the Fig. 7: MSE vs. the scale parameter σ of i.i.d. log-normal information available at the estimator Si, WS ,Di : Di E { i ≤ service time for fmax =1.5 and [Yi]=1, where mseperiodic = t . Similar to the one-sided sub-gradient in finite dimensional due to queueing. space,} the one-sided Gˆateaux derivative of the functional h in ∞ the direction of g at a point f is given by the tail of the log-normal distribution becomes heavier and δh(f; g) heavier. We observe that mse grows quickly with respect h(ft + ǫgt) h(ft) periodic = lim − to σ, much faster than mseopt and mseage-opt. In addition, the ǫ→0+ ǫ gap between mseopt and mseage-opt increases as σ grows. In Di+1∧T 1E 2 2 Fig. 7, because f > 1, mse is infinite and hence is = lim (ft + ǫgt Wt) (ft Wt) dt max periodic ǫ→0+ ǫ ( Di∧T − − − not plotted. We can find that msezero-wait grows quickly with Z respect to σ and is much larger than mse and mse . In opt age-opt (S , W ,D ) . summary, periodic sampling, age-optimal sampling, and zero- j Sj j j≤i ) wait sampling policies are all far from optimal if the service Di+1∧T times follow a heavy-tail distribution. E 2 = lim 2( ft Wt)gt + ǫgt dt ǫ→0+ − (ZDi∧T VII. CONCLUSION (Sj , WSj ,Dj )j≤i ) In this paper, we have investigated optimal sampling of the Wiener process for remote estimation over a queue. The optimal sampling policy for minimizing the mean square Di+1∧T =E 2(f W )g dt (S , W ,D ) estimation error subject to an average sampling rate constraint t t t j Sj j j≤i ( Di∧T − ) has been obtained in a semi-closed form. We prove that a Z Di+1∧T threshold-based sampler is optimal and the optimal threshold =E 2 ft E Wt (S j , WS ,Dj )j≤i gtdt is found exactly. Analytical and numerical comparisons with − | j (ZDi∧T several important sampling policies, including age-optimal   sampling, zero-wait sampling, and traditional periodic sam- (Sj , WSj ,Dj)j≤i , (58) pling, have been provided. The results in this paper generalize ) recent research on age of information by adding a signal- where the last step follows from the iterated law of expecta- based control model, and generalize existing studies on remote tions. According to [61, p. 710], ft is an optimal solution to estimation by adding a queueing model with random service (57) if and only if times. δh(f; g) 0, g. ≥ ∀ ACKNOWLEDGEMENT By δh(f; g)= δh(f; g), we get − − The authors appreciate Ness B. Shroff and Roy D. Yates for δh(f; g)=0, g. (59) ∀ their careful reading of the conference version of this paper Since g is arbitrary, by (58) and (59), the optimal solution to and their valuable suggestions. The authors are also grateful to t (57) is Aditya Mahajan for pointing out an error in an earlier version of this paper. ft =E[Wt (Sj , WS ,Dj)j i], | j ≤ =WS + E[Wt WS (Sj , WS ,Dj)j i] i − i | j ≤ APPENDIX A t [Di T,Di T ). (60) ∈ ∧ +1 ∧ PROOF OF (4) Notice that under any online sampling policy π, Sj, WS ,Dj , { j We use the calculus of variations to prove (4). Define x y = j i are determined by the source (Wt,t [0,Si]) and the ∧ ≤ } ∈ min x, y . Let us consider a functional h of the estimate Wˆ t, service times (Y ,...,Yi). According to (i) the strong Markov { } 1 11

property of the Wiener process [55, Theorem 2.16 and Remark Case 1: If Di T Si, we can obtain ∧ ≥ 2.17] and (ii) the fact that the Yi’s are independent of the Di+1∧T 2 Wiener process Wt, we obtain that for any given realization E (Wt Wˆ t) dt − of (Sj , WS ,Dj )j i, Wt WS ,t Si is a Wiener process. ( Di∧T ) j ≤ { − i ≥ } Z Hence, Di+1∧T =E (W Wˆ )2dt E t Si [Wt WSi (Sj , WSj ,Dj)j≤i]=0 (61) ( Di∧T − ) − | Z Di+1∧T for all t Si. Therefore, the optimal solution to (57) is (a)E E 2 ≥ = (Wt WSi ) dt Si,Di,Di+1 ( ( Di∧T − )) ft = WSi , if t [Di T,Di+1 T ),i =1, 2,... (62) Z ∈ ∧ ∧ Di+1∧T (b) 2 =E E (Wt WS ) Si ,Di,Di dt Finally, we note that − i | +1 (ZDi∧T ) n Di+1∧T Di+1∧T  2 (c) lim E (Wt Wˆ t) dt =E (t Si)dt n→∞ − i=0 ( Di∧T ) ( Di∧T − ) X Z Z Dn∧T Di+1∧T 2 (d) = lim E (Wt Wˆ t) dt = E ∆(t)dt , (65) n→∞ − (Z0 ) (ZDi∧T ) Dn∧T (a) 2 where Step (a) is due to the law of iterated expectations, Step = E lim (Wt Wˆ t) dt n (b) is due to Fubini’s theorem, Step (c) is due to the strong ( →∞ 0 − ) Z of the Wiener process [55, Theorem 2.16] T (b) 2 and the fact that Si,Di,Di are independent of the Wiener =E (Wt Wˆ t) dt (63) +1 − process, and Step (d) is due to (19). (Z0 ) Case 2: If Di T 0. Let T in (62), we obtain that (4) is the MMSE estimator for→ solving ∞ (64). This completes the proof. APPENDIX C PROOFS OF (25) AND (27) If f 0, (15) tells us that max → E 2 1 [max(β, WY )] = , (67) fmax which implies

1 E 2 E APPENDIX B β β + [WY ]= β + [Y ]. (68) ≤ fmax ≤ PROOF OF (18) Hence, 1 1 E[Y ] β . (69) fmax − ≤ ≤ fmax

If fmax 0, (25) follows. If π is independent of Wt,t [0, ) , the Si’s and → { ∈ ∞ } Because Y is independent of the Wiener process, using the Di’s are independent of Wt,t [0, ) . Define x y = { ∈ ∞ } ∧ law of iterated expectations and the Gaussian distribution of min x, y . For any T > 0, let us consider the term E 4 E 2 { } the Wiener process, we can obtain [WY ] =3 [Y ] and E 2 E Di+1∧T [WY ]=3 [Y ]. Hence, 2 E (Wt Wˆ t) dt − β E[max(β, W 2 )] β + E[W 2 ]= β + E[Y ], (ZDi∧T ) ≤ Y ≤ Y β2 E[max(β2, W 4 )] β2 + E[W 4 ]= β2 +3E[Y 2]. in the following two cases: ≤ Y ≤ Y 12

Therefore, we can get β ess inf Y from (31) and hence ≤ β2 E[max(β2, W 4 )] β2 +3E[Y 2] Y . (70) β Y (75) β + E[Y ] ≤ E[max(β, W 2 )] ≤ β ≤ Y with probability one. According to (74) and (75), such a β is By combining (16), (25), and (70), (27) follows in the case of the solution to (30). Hence, the zero-wait policy expressed by fmax 0. (11) with β ess inf Y is the age-optimal policy. If → , then and with probability one. ≤ α 0 Y 0 WY 0 Case 2: E[Y ]=0 and hence Y =0 with probability one. In Hence, →E →2 and→E 2 4 2. [max(β, WY )] β [max(β , WY )] β this case, β = 0 is the solution to (30). Hence, the zero-wait Substituting these into (15)→ and (70), yields → policy expressed by (11) with β =0 is the age-optimal policy. 1 E[max(β2, W 4 )] 1 Combining these two cases, the proof is completed. lim β = , lim Y + E[Y ] = . α→0 f α→0 6E[max(β, W 2 )] 6f max  Y  max By this, (25) and (27) are proven in the case of α 0. This → completes the proof. APPENDIX F PROOF OF LEMMA 1 APPENDIX D

PROOF OF (29) Recall that Si and and Gi are the sampling time and the

If fmax , the sampling rate constraint in (13) can be service starting time of sample i, respectively. Suppose that in removed. By→ (15), ∞ the optimal β is determined by (29). the sampling policy π, sample i is generated when the server If α , let us consider the equation is busy sending another sample, and hence sample i needs → ∞ to wait for some time before being submitted to the server, E 2 4 E 2 [max(β , WY )] i.e., S < G . Let us consider a virtual sampling policy π′ = [max(β, WY )]= . (71) i i 2β S ,...,Si , Gi,Si ,... such that the generation time of { 0 −1 +1 } E 2 sample i is postponed from S to G . We call policy π′ a virtual If Y grows by α times, then β and [max(β, WY )] in (71) i i E 2 4 policy because it may happen that G >S . However, this both should grow by α times, and [max(β , WY )] in (71) i i+1 should grow by α2 times. Hence, if α , it holds in (15) will not affect our proof below. We will show that the MSE that → ∞ of the sampling policy π′ is smaller than that of the sampling E 2 4 policy π = S0,...,Si−1,Si,Si+1,... . 1 [max(β , WY )] { } (72) Note that the Wiener process does f ≤ 2β Wt : t [0, ) max not change according to the sampling{ policy,∈ and the∞ } sample and the solution to (15) is given by (29). This completes the delivery times D0,D1,D2,... remain the same in policy π proof. and policy π′.{ Hence, the only} difference between policies π and π′ is that the generation time of sample i is postponed APPENDIX E from Si to Gi. The MMSE estimator under policy π is given PROOFS OF THEOREMS 3 AND 4 by (4) and the MMSE estimator under policy π′ is given by Proof of Theorem 3. The zero-wait policy can be expressed Wˆ t =E[Wt (Sj , WS ,Dj )j≤i−1, (Gi, WG ,Di)] as (12) with β = 0. Because Y is independent of the | j i 0, t [0,D ); Wiener process, using the law of iterated expectations and the ∈ 1 = WG , t [Di,Di ); (76) Gaussian distribution of the Wiener process, we can obtain  i ∈ +1 E 4 E 2 WSj , t [Dj ,Dj+1), j = i, j 1. [WY ]=3 [Y ]. According to (29), β = 0 if and only if  ∈ 6 ≥ E[W 4 ]=3E[Y 2]=0 which is equivalent to Y = 0 with ′′ Y Next, we consider a third virtual sampling policy π in probability one. This completes the proof. which the samples (WGi , Gi) and (WSi ,Si) are both delivered Proof of Theorem 4. In the one direction, the zero-wait policy to the estimator at time Di. Clearly, the estimator under policy ′′ ′ can be expressed as (11) with β ess inf Y . If the zero- π has more information than those under policies π and π . wait policy is optimal, then the solution≤ to (30) must satisfy By following the arguments in Appendix A, one can show that ′′ β ess inf Y , which further implies β Y with probability the MMSE estimator under policy π is one.≤ From this, we can get ≤ Wˆ t =E[Wt (Sj , WS ,Dj )j i, (Gi, WG ,Di)] | j ≤ i 2 2ess inf Y E[Y ] 2βE[Y ]= E[Y ], (73) 0, t [0,D1); ≥ ∈ = WG , t [Di,Di ); (77) By this, (31) follows.  i ∈ +1 WS , t [Dj ,Dj+1), j = i, j 1. In the other direction, if (31) holds, we will show that the  j ∈ 6 ≥ zero-wait policy is age-optimal by considering the following Notice that, because of the strong Markov property of Wiener two cases. process, the estimator under policy π′′ uses the fresher sample ˆ Case 1: E[Y ] > 0. By choosing WGi , instead of the stale sample WSi , to construct Wt during [D ,D ). Because the estimator under policy π′′ has more E 2 i i+1 [Y ] information than that under policy π, one can imagine that β = E , (74) 2 [Y ] policy π′′ has a smaller estimation error than policy π, i.e., 13

for any T > 0 Because the inter-sampling times Ti = Yi + Zi are regenera- 2 tive, E[kj+1 kj ] < and 0 < E[(Sk Sk ) ] < Di+1∧T − ∞ j+1 − j ∞ E (W W )2dt for all j, the [51] tells us that the limit t Si n−1 D ∧T − 1 E (Z i ) limn→∞ n i=0 [Yi +Zi] exists and is positive. By this, Di+1∧T we get E 2 P (Wt WGi ) dt . (78) n−1 ≥ − Di+1 ( Di∧T ) 1 2 Z lim E (Wt WS ) dt c(Yi + Zi) 0. n→∞ n − i − ≤ To prove (78), we invoke the orthogonality principle of the i=0 "ZDi # MMSE estimator [41, Prop. V.C.2] under policy π′′ and obtain X (83)

Di+1∧T Therefore, p(c) 0. E (79) ≤ 2(Wt WGi )(WGi WSi )dt =0, On the reverse direction, if p(c) 0, then there ex- ( Di∧T − − ) Z ists a policy π = (Z ,Z ,...) Π≤ that is feasible for 0 1 ∈ 1 where we have used the fact that WGi and WSi are available both (35) and (37), which satisfies (83). Because the limit by the MMSE estimator under policy π′′. Next, from (79), we 1 n−1 E limn→∞ n i=0 [Yi +Zi] exists and is positive, from (83), can get we can derive (82) and (81). Hence, mseopt c. By this, we P mse ≤ Di+1∧T have proven that opt c if and only if p(c) 0. E 2 ≤ ≤ (Wt WSi ) dt Step 2: We needs to prove that mse < c if and only if − opt (ZDi∧T ) p(c) < 0. This statement can be proven by using the arguments Di+1∧T in Step 1, in which “ ” should be replaced by “<”. Finally, E 2 2 ≤ = (Wt WGi ) + (WGi WSi ) dt from the statement of Step 1, it immediately follows that ( Di∧T − − ) Z mseopt >c if and only if p(c) > 0. This completes the proof Di+1∧T E of part (a). + 2(Wt WGi )(WGi WSi )dt D ∧T − − Part (b): We first show that each optimal solution to (35) (Z i ) Di+1∧T is an optimal solution to (37). By the claim of part (a), E 2 2 mse = (Wt WGi ) + (WGi WSi ) dt p(c)=0 is equivalent to opt = c. Suppose that policy D ∧T − − (Z i ) π = (Z0,Z1,...) Π1 is an optimal solution to (35). Then, ∈ Di+1∧T mseπ = mseopt = c. Applying this in the arguments of (81)- 2 E (Wt WG ) dt . (80) ≥ − i (83), we can show that policy π satisfies (ZDi∧T ) n−1 Di+1 ′′ 1 2 In other words, the estimation error of policy π is no greater lim E (Wt WS ) dt c(Yi + Zi) =0. n→∞ n − i − than that of policy π. Furthermore, by comparing (76) and i=0 "ZDi # (77), we can see that the MMSE estimators under policies π′′ X This and p(c)=0 imply that policy π is an optimal solution and π′ are exact the same. Therefore, the estimation error of to (37). policy π′ is no greater than that of policy π. Similarly, we can prove that each optimal solution to (37) By repeating the above arguments for all samples i sat- is an optimal solution to (35). By this, part (b) is proven. isfying Si < Gi, one can show that the sampling policy π = S , G ,...,Gi , Gi, Gi ,... is better than the 1 { 0 1 −1 +1 } sampling policy π = S0,S1,...,Si−1,Si,Si+1,... . This completes the proof. { } APPENDIX H PROOF OF LEMMA 3

APPENDIX G t According to Theorem 2.51 of [55], W 4 6 W 2ds is an PROOF OF LEMMA 2 t − 0 s martingale of the Wiener process Wt,t [0, ) . Because Part (a) is proven in two steps: the minimum of two stopping times{ is a∈ stoppingR∞ } time and Step 1: We will prove that mseopt c if and only if p(c) constant times are stopping times [56], it follows that t τ is ≤ ≤ 0. a bounded stopping time for every t [0, ), where x ∧ y = ∈ ∞ ∧ If mseopt c, then there exists a policy π = (Z0,Z1,...) min x, y . Then, it follows from Theorem 8.5.1 of [56] that ≤ ∈ { } Π1 that is feasible for both (35) and (37), which satisfies for every t [0, ) ∈ ∞ n−1 Di+1 2 t∧τ E (Wt WS ) dt 2 1 4 i=0 Di − i E W ds = E W . (84) lim c. (81) s 6 t∧τ n→∞ h n−1 E i ≤ 0 P Ri=0 [Yi +Zi] Z  t τ   Notice that ∧ W 2ds is positive and increasing with respect Hence, P 0 s to t. By applying the monotone convergence theorem [56, 1 n−1 E Di+1 2 R n i=0 D (Wt WSi ) dt c(Yi + Zi) Theorem 1.5.5], we can obtain lim i − − 0. n→∞ h 1 n−1 E i ≤ t∧τ τ P R n i=0 [Yi +Zi] E 2 E 2 lim Ws ds = Ws ds . (85) (82) t→∞ P Z0  Z0  14

4 Hence, the limit limt E W exists. The remaining task According to [47, p. 252] and [62, Chapter 6], i is a sufficient →∞ t∧τ I is to show that statistics for determining Zi in (40). Therefore, there exists an   E 4 E 4 optimal policy (Z0,Z1,...) in which Zi is determined based lim Wt∧τ = Wτ . (86) t→∞ on only i, which is independent of (Wt : t [0,Si]). This I ∈ Towards this goal, let us consider   completes the proof. E 4 Wτ 4 =E [Wt τ (Wτ Wt τ )]  ∧ − − ∧ E 4 E 3 = Wt∧τ +4 Wt∧τ (Wτ Wt∧τ ) 2 2− +6E W (Wτ Wt τ )  t∧τ  − ∧  E 3 E 4 +4 Wt∧τ (Wτ Wt∧τ )  + (Wτ Wt∧τ ) Proof of (43). By using (34) and Lemma 6, we obtain that for 4 −3 − =E W +4E W E [Wτ Wt τ ] given Yi and Yi+1, Yi and Yi + Zi + Yi+1 are stopping times t∧τ t∧τ  −  ∧  E 2 E 2 of the time-shifted Wiener process WSi+t WSi ,t 0 . +6 Wt∧τ  (Wτ  Wt∧τ ) { − ≥ } − 3 4 Hence, +4E [Wt τ ] E (Wτ Wt τ ) +E (Wτ Wt τ ) ,  ∧   − ∧  − ∧ where in the last step we have used the strong Markov property of the Wiener process [55, Theorem 2.16]. By Wald’s lemma Di+1 =E (W W )2dt for Wiener process [55, Theorem 2.44 and Theorem 2.48], t Si Di − 2 (Z ) E [Wτ ]=0 and E W = E [τ] for any stopping time τ with τ Yi+Zi+Yi+1 E [τ] < . Hence, E 2 ∞   = (WSi+t WSi ) dt ( Yi − ) E [Wτ Wt∧τ ]=0, (87) Z Y +Z +Y +1 − (a) i i i E [Wt∧τ ]=0, (88) E E 2 = (WSi+t WSi ) dt Yi, Yi+1 ( ( Yi − )) which implies Z

(b) 1 4 4 2 2 E E 4 E W =E W +6E W E (Wτ Wt∧τ ) = (WSi+Yi+Zi+Yi+1 WSi ) Yi, Yi+1 τ t∧τ t∧τ − 6 ( ( − )) E 4   + (Wτ Wt∧τ )    − 1 E W 4 , (89) E E 4 t∧τ  (WSi+Yi WSi ) Yi, Yi +1 ≥ − 6 ( ( − )) and hence   (c) 1 1 = E (W W )4 E (W W )4 , E 4 E 4 Si+Yi+Zi+Yi+1 Si Si+Yi Si Wτ lim Wt∧τ . (90) 6 − − 6 − ≥ t→∞    (93) On the other hand, by Fatou’s lemma [56, Theorem 1.5.4], where Step (a) and Step (c) are due to the law of iterated E 4 E 4 E 4 expectations, and Step (b) is due to Lemma 3. Because Si+1 = Wτ = lim inf Wt∧τ lim Wt∧τ . (91) t→∞ ≤ t→∞ Si + Yi + Zi, we have Combining  (90) andh (91), yieldsi (86). This completes the E (W W )4 proof. Si+Yi+Zi+Yi+1 Si − 4 =E [(WS Y Z WS ) + (WS Y WS )]  i+ i+ i − i  i+1+ i+1 − i+1 E 4 APPENDIX I = (WSi+Yi+Zi WSi ) − 3 PROOF OF (43) +4E (WS Y Z WS ) (WS Y WS )  i+ i+ i −  i i+1+ i+1 − i+1 E 2 2 The following lemma is needed in the proof of (43): +6 (WSi+Yi+Zi WSi ) (WSi+1+Yi+1 WSi+1 ) − − 3 +4E (WS Y Z WS )(WS Y WS ) Lemma 6. For any λ 0, there exists an optimal solution  i+ i+ i − i i+1+ i+1 − i+1  ≥ E 4 (Z0,Z1,...) to (40) in which Zi is independent of (Wt,t + (WSi+1+Yi+1 WSi+1 )  ∈ − 4 [0,Si]) for all i =1, 2,... =E (WS Y Z WS )  i+ i+ i − i  E 3 E Proof. Because the Yi’s are i.i.d., Zi is independent of +4 (WSi+Yi+Zi WSi ) (WSi+1+Yi+1 WSi+1 ) − 2 − 2 Yi+1, Yi+2,..., and the strong Markov property of the Wiener +6E (WS Y Z WS ) E (WS Y WS )  i+ i+ i − i   i+1+ i+1 − i+1  process [55, Theorem 2.16], in the Lagrangian L(π; λ) the E E 3 +4 [(WSi+Yi+Zi WSi )] (WSi+1+Yi+1 WSi+1 ) term related to Zi is  −   −  E 4 + (WSi+1+Yi+1 WSi+1 ) , (94) Si+Yi+Zi+Yi+1 −   E 2 mse (Wt WSi ) dt ( opt + λ)(Yi + Zi) , where in the last equation we have used the fact that Y − −   i+1 "ZSi+Yi # is independent of Y and Z , and the strong Markov property (92) i i of the Wiener process [55, Theorem 2.16]. By Wald’s lemma which is determined by the control decision Zi and the recent for Wiener process [55, Theorem 2.44 and Theorem 2.48], 2 information of the system i = (Yi, (WS t WS , t 0)). E [Wτ ]=0 and E W = E [τ] for any stopping time τ with I i+ − i ≥ τ   15

E [τ] < . Hence, and ∞ E 1 4 (WSi+1+Yi+1 WSi+1 ) =0, (95) u(x)= E[g(X ) X = x]= g(x)= βs b . (102) − 0 0 E | − 2 [(WSi+Yi+Zi WSi )]=0, (96)   2 ∗ ∗ 2 − 2 2 Case 2: If b <β, then τ > 0 and (b + Wτ ) = β. Invoking E (WS Y Z WS ) E (WS Y WS ) i+ i+ i − i i+1+ i+1 − i+1 Theorem 8.5.5 in [56], yields = E[Y + Z ]E[Y ]. (97)  i i i+1    ∗ 2 Exτ = ( β b)( β b)= β b . (103) Therefore, we have − − − − − Using this, we canp obtain p E 4 (WSi+Yi+Zi+Yi+1 WSi ) E ∗ − 4 u(x)= xg(Xτ ) =E (WS Y Z WS ) +6E [Yi +Zi] E [Yi ]  i+ i+ i − i  +1 1 4 E ∗ E ∗ 4 +E (WS Y WS ) . (98) = β(s + xτ ) x (b + Wτ )  i+1+ i+1 − i+1 − 2 1 Finally, because (WS t WS ) and (WS t WS ) are 2 2  i+ − i i+1+ − i+1 = β(s + β b ) β both Wiener processes, and the Yi’s are i.i.d., − − 2 1 2 2 E 4 E 4 = βs + β b β. (104) (WSi+Yi WSi ) = (WSi+1+Yi+1 WSi+1 ) . (99) 2 − − − Combining (93)-(99), yields (43).  Hence, in Case 2, 1 1 1 u(x) g(x)= β2 b2β + b4 = (b2 β)2 0. (105) − 2 − 2 2 − ≥ APPENDIX J By combining these two cases, Lemma 4 is proven. PROOF OF THEOREM 5

By (43), (92) can be rewritten as APPENDIX L PROOF OF LEMMA 5 Si+Yi+Zi+Yi+1 The function u(s,b) is continuous differentiable in (s,b). E 2 mse 2 (Wt WSi ) dt ( opt + λ)(Yi + Zi) ∂ In addition, 2 u(s,b) is continuous everywhere but at b = " Si+Yi − − # ∂ b Z √β. By the Itˆo-Tanaka-Meyer formula [55, Theorem 7.14 1 4 ± =E (WS Y Z WS ) (mseopt + λ E[Y ])(Yi +Zi) and Corollary 7.35], we obtain that almost surely 6 i+ i+ i − i − −   u(s + t,b + Wt) u(s,b) 1 4 β =E (WS Y Z WS ) (Yi +Zi) t − 6 i+ i+ i − i − 3 ∂   = u(s + r, b + Wr)dWr 1 4 0 ∂b =E [(WS Y WS )+(WS Y Z WS Y )] Z 6 i+ i − i i+ i+ i − i+ i t ∂  + u(s + r, b + Wr)dr β 0 ∂s (Yi +Zi) . (100) Z − 3 1 ∞ ∂2  + La(t) u(s + r, b + a)da, (106) 2 ∂b2 Because the Yi’s are i.i.d. and the strong Markov property Z−∞ of the Wiener process [55, Theorem 2.16], the expectation where La(t) is the that the Wiener process spends in (100) is determined by the control decision Zi and the at the level a, i.e., ′ information i = (WSi+Yi WSi , Yi, (WSi+Yi+t WSi+Yi , t I − − ′ a 1 t 0)). According to [47, p. 252] and [62, Chapter 6], i is a L (t) = lim 1{|W −a|≤ǫ}ds, (107) ≥ I ǫ↓0 2ǫ s sufficient statistics for determining the waiting time Zi in (40). Z0 Therefore, there exists an optimal policy (Z0,Z1,...) in which and 1A is the indicator function of event A. By the property ′ Zi is determined based on only . By this, (40) is decom- of local times of the Wiener process [55, Theorem 6.18], we Ii posed into a sequence of per-sample control problems (44). obtain that almost surely Combining (35), (43), and Lemma 2, yields mseopt E[Y ]. ≥ u(s + t,b + Wt) u(s,b) Hence, β 0. − ≥ t We note that, because the Yi’s are i.i.d. and the strong ∂ = u(s + r, b + Wr)dWr Markov property of the Wiener process, the Zi’s in this 0 ∂b Z t optimal policy are i.i.d. Similarly, the (WSi+Yi+Zi WSi )’s ∂ − + u(s + r, b + W )dr in this optimal policy are i.i.d. ∂s r Z0 1 t ∂2 + u(s + r, b + Wr)dr. (108) APPENDIX K 2 ∂b2 Z0 PROOF OF LEMMA 4 Because 2 Case 1: If b β, then (49) tells us that ∂ 2b3, if b2 β; ≥ u(s,b)= − ≥ (109) τ ∗ =0 (101) ∂b ( 2βb, if b2 < β, − 16 we can obtain that for all t 0 and all x = (s,b) R2 Using the law of iterated expectations, the strong Markov ≥ ∈ t 2 property of the Wiener process, and Wald’s identity ∂ 2 E E[(WS Y WS ) ]= E[Yi], yields x u(s + r, b + Wr) dr < . (110) i+ i − i ( 0 ∂b ) ∞ Z   E[Zi + Yi] t ∂ This and Theorem 7.11 of [55] imply that u(s + r, b + =E[E[Zi Yi]+ Yi] 0 ∂b | Wr)dWr is a martingale and E 2 R = [max(β (WSi+Yi WSi ) , 0)+ Yi] − − t ∂ E 2 2 E = [max(β (WSi+Yi WSi ) , 0)+(WSi+Yi WSi ) ] x u(s + r, b + Wr)dWr =0, t 0. (111) − − − 0 ∂b ∀ ≥ E 2 Z  = [max(β, (WSi+Yi WSi ) )]. (121) By combining (46), (108), and (111), we get − Finally, because Wt and WSi+t WSi are of the same distribu- t 2 − E E ∂ 1 ∂ tion, (54) and (55) follow from (121) and (117), respectively. x [u(Xt)] u(x)= x u(Xr)+ 2 u(Xr) dr . − 0 ∂s 2 ∂b This completes the proof. Z   (112) It is easy to compute that if b2 >β, ∂ 1 ∂2 2 (113) u(s,b)+ 2 u(s,b)= β 3b 0; ∂s 2 ∂b − ≤ APPENDIX N 2 and if b <β, PROOF OF THEOREM 7 ∂ 1 ∂2 u(s,b)+ u(s,b)= β β =0. (114) ∂s 2 ∂b2 − Hence, According to [53, Prop. 6.2.5], if we can find π⋆ = ⋆ ∂ 1 ∂2 (Z0,Z1,...) and λ satisfying the following conditions: u(s,b)+ u(s,b) 0 (115) ∂s 2 ∂b2 ≤ n−1 ⋆ 1 1 2 π Π1, lim E [Yi + Zi] 0, (122) for all (s,b) R except for b = √β. Since the Lebesgue ∈ n→∞ n − f ≥ ∈ ± i=0 max measure of those r for which b + Wr = √β is zero, we get ⋆ X ± 2 λ 0, (123) from (112) and (115) that Ex [u(Xt)] u(x) for all x R ≥⋆ ⋆ ⋆ and t 0. This completes the proof. ≤ ∈ L(π ; λ ) = inf L(π; λ ), (124) ≥ π∈Π1 n−1 ⋆ 1 1 APPENDIX M λ lim E [Yi + Zi] =0, (125) n→∞ n − f PROOF OF COROLLARY 1 ( i=0 max ) X Because (49) is the optimal solution to (47), by choosing then π⋆ is an optimal solution to the primal problem (37) and ⋆ s = Yi, b = WS +Y WS , and using WS +Y +t WS to λ is a geometric multiplier [53] for the primal problem (37). i i − i i i − i ⋆ ⋆ replace Wt, it is immediate that (53) is the optimal solution Further, if we can find such π and λ , then the duality gap to (44). between (37) and (41) must be zero, because otherwise there The remaining task is to prove (54) and (55). According to is no geometric multiplier [53, Prop. 6.2.3(b)]. We note that (53) with β 0, we have (122)-(125) are different from the Karush-Kuhn-Tucker (KKT) ≥ conditions because of (124). WS +Y +Z WS i i i − i The remaining task is to find π⋆ and λ⋆ that satisfies (122)- WS Y WS , if WS Y WS √β; = i+ i − i | i+ i − i |≥ (116) (125). According to Theorem 5 and Corollary 1, the solution √β, if WS +Y WS < √β. ⋆ ⋆ i i i π to (124) is given by (53) where β = 3(mseopt +λ E [Y ]).  | − | − Hence, In addition, as shown in the proof of Theorem 5, the Zi’s in policy π⋆ are i.i.d. Using (122), (123), and (125), the value E[(W W )4]= E[max(β2, (W W )4)]. Si +Yi+Zi Si Si+Yi Si of λ⋆ can be obtained by considering two cases: If λ⋆ > 0, − − (117) because the Zi’s are i.i.d., we have

In addition, from (101) and (103) we know that if WSi+Yi n−1 | − 1 E E 1 WSi √β, then (101) implies lim [Yi + Zi]= [Yi + Zi]= . (126) |≥ n→∞ n f i=0 max E[Zi Yi] = 0; (118) X | If λ⋆ =0, then otherwise, if WS Y WS < √β, then (103) implies | i+ i − i | n−1 1 E E 1 E[Z Y ]= β (W W )2. (119) lim [Yi + Zi]= [Yi + Zi] . (127) i i Si+Yi Si n→∞ n ≥ f | − − i=0 max By combining these two cases, we get X Next, we use (126), (127), and β = 3(mse + λ⋆ E [Y ]) E 2 opt [Zi Yi] = max[β (WSi+Yi WSi ) , 0]. (120) ⋆ mse − ⋆ | − − to determine λ . To compute opt, we substitute policy π 17

and (43) into (35), which yields [8] B. T. Bacinoglu, E. T. Ceran, and E. Uysal-Biyikoglu, “Age of infor- mation under energy replenishment constraints,” in Information Theory mseopt and Applications Workshop (ITA), 2015. [9] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, n−1 4 E (WS +Y +Z WS ) +(Yi + Zi)E[Y ] “Update or wait: How to keep your data fresh,” in IEEE INFOCOM, = lim i=0 i i i − i n→∞ n−1 E 2016. P  6 i=0 [Yi +Zi]  [10] ——, “Update or wait: How to keep your data fresh,” IEEE Trans. Inf. E (W W )4 Theory, vol. 63, no. 11, pp. 7492 – 7508, Nov. 2017. Si+Yi+Zi SiP E = E − + [Y ], (128) [11] Y. Sun and B. Cyr, “Sampling for data freshness optimization: Non-  6 [Yi +Zi]  linear age functions,” Journal of Communications and Networks (JCN) where in the last equation we have used that the ’s are i.i.d. - special issue on Age of Information, in press, 2019. Zi [12] M. Costa, M. Codreanu, and A. Ephremides, “On the age of information and the (WSi+Yi+Zi WSi )’s are i.i.d., which were shown in in status update systems with packet management,” IEEE Trans. Inf. − the proof of Theorem 5. Hence, the value of β = 3(mseopt + Theory, vol. 62, no. 4, pp. 1897–1910, April 2016. λ⋆ E [Y ]) can be obtained by considering the following two [13] C. Kam, S. Kompella, G. D. Nguyen, and A. Ephremides, “Effect of − message transmission path diversity on status age,” IEEE Trans. Inf. cases: Theory, vol. 62, no. 3, pp. 1360–1374, March 2016. Case 1: If λ⋆ > 0, then (128) and (126) imply that [14] A. M. Bedewy, Y. Sun, and N. B. Shroff, “Optimizing data freshness, throughput, and delay in multi-server information-update systems,” in 1 E (129) IEEE ISIT, 2016. [Yi + Zi]= , [15] ——, “Age-optimal information updates in multihop networks,” in IEEE fmax 4 ISIT, 2017. E (WS Y Z WS ) mse E i+ i+ i − i (130) [16] I. Kadota, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, “Minimizing β > 3( opt [Y ]) = . the age of information in broadcast wireless networks,” in Allerton − 2E [Yi +Zi]   Conference, 2016. Case 2: If λ⋆ =0, then (128) and (127) imply that [17] B. T. Bacinoglu and E. Uysal-Biyikoglu, “Scheduling status updates to minimize age of information with an energy harvesting sensor,” in IEEE 1 ISIT, 2017. E [Yi + Zi] , (131) ≥ fmax [18] T. Z. Ornee and Y. Sun, “Sampling for remote estimation through E 4 queues: Age of information and beyond,” in IEEE WiOpt, 2019. (WSi+Yi+Zi WSi ) [19] E. Vinogradov, H. Sallouha, S. D. Bast, M. M. Azari, and S. Pollin, β = 3(mseopt E[Y ]) = − . (132) − 2E [Yi +Zi] “Tutorial on UAV: A blue sky view on wireless communication,” 2019,   coRR, abs/1901.02306. Combining (129)-(132), yields that β is the root of [20] T. Soleymani, S. Hirche, and J. S. Baras, “Optimal information control in cyber-physical systems,” IFAC-PapersOnLine, vol. 49, no. 22, pp. 1– 4 1 E[(WS Y Z WS ) ] 6, 2016, 6th IFAC Workshop on Distributed Estimation and Control in E[Y + Z ]=max , i + i+ i − i . i i f 2β Networked Systems NECSYS 2016.  max  [21] K. J. Astr¨om˚ and B. M. Bernhardsson, “Comparison of Riemann and (133) Lebesgue sampling for first order stochastic systems,” in IEEE CDC, Substituting (54) and (55) into (133), we obtain that is the 2002. β [22] B. Hajek, “Jointly optimal paging and registration for a symmetric root of (15). Further, (53) can be rewritten as (12). Hence, ,” in IEEE ITW, Oct 2002, pp. 20–23. if we choose π⋆ as the sampling policy in (12) and choose [23] B. Hajek, K. Mitzel, and S. Yang, “Paging and registration in cellular ⋆ networks: Jointly optimal policies and an iterative algorithm,” IEEE λ = β/3 mseopt + E [Y ] where β is the root of (15), then ⋆ ⋆− Trans. Inf. Theory, vol. 54, no. 2, pp. 608–622, Feb 2008. π and λ satisfies (122)-(125). By using the properties of [24] G. M. Lipsa and N. C. Martins, “Optimal state estimation in the presence geometric multiplier mentioned above, (12) and (15) is an of communication costs and packet drops,” in 47th Annual Allerton optimal solution to the primal problem (37). Conference on Communication, Control, and Computing (Allerton), Sept 2009, pp. 160–169. Because the problems (13), (35), and (37) are equivalent, [25] ——, “Remote state estimation with communication costs for first-order (12) and (15) is also an optimal solution to (13) and (35). LTI systems,” IEEE Trans. Auto. Control, vol. 56, no. 9, pp. 2013–2025, Sept. 2011. The optimal objective value mseopt is given by (128). Sub- [26] A. Molin and S. Hirche, “An iterative algorithm for optimal event- stituting (54) and (55) into (128), (16) follows. This completes triggered estimation,” IFAC Proceedings Volumes, vol. 45, no. 9, pp. the proof. 64 – 69, 2012, 4th IFAC Conference on Analysis and Design of Hybrid Systems. [27] A. Nayyar, T. Bas¸ar, D. Teneketzis, and V. V. Veeravalli, “Optimal REFERENCES strategies for communication and remote estimation with an energy [1] Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation of harvesting sensor,” IEEE Trans. Auto. Control, vol. 58, no. 9, 2013. the Wiener process over a channel with random delay,” in IEEE ISIT, [28] M. Rabi, G. V. Moustakides, and J. S. Baras, “Adaptive sampling for 2017. linear state estimation,” SIAM Journal on Control and Optimization, [2] X. Song and J. W. S. Liu, “Performance of multiversion concurrency vol. 50, no. 2, pp. 672–702, 2012. control algorithms in maintaining temporal consistency,” in Fourteenth [29] K. Nar and T. Bas¸ar, “Sampling multidimensional Wiener processes,” Annual International Computer Software and Applications Conference, in IEEE CDC, Dec. 2014, pp. 3426–3431. Oct 1990, pp. 132–139. [30] X. Gao, E. Akyol, and T. Bas¸ar, “Optimal sensor scheduling and [3] S. Kaul, R. D. Yates, and M. Gruteser, “Real-time status: How often remote estimation over an additive noise channel,” in American Control should one update?” in IEEE INFOCOM, 2012. Conference (ACC), July 2015, pp. 2723–2728. [4] R. D. Yates and S. K. Kaul, “Real-time status updating: Multiple [31] ——, “Optimal estimation with limited measurements and noisy com- sources,” in IEEE ISIT, Jul. 2012. munication,” in IEEE CDC, 2015. [5] ——, “The age of information: Real-time status updating by multiple [32] ——, “Optimal communication scheduling and remote estimation over sources,” CoRR, abs/1608.08622, submitted to IEEE Trans. Inf. Theory, an additive noise channel,” 2016, https://arxiv.org/abs/1610.05471. 2016. [33] ——, “On remote estimation with multiple communication channels,” [6] R. D. Yates, “Lazy is timely: Status updates by an energy harvesting in American Control Conference (ACC), July 2016, pp. 5425–5430. source,” in IEEE ISIT, 2015. [34] ——, “Joint optimization of communication scheduling and online [7] C. Kam, S. Kompella, and A. Ephremides, “Age of information under power allocation in remote estimation,” in 50th Asilomar Conference random updates,” in IEEE ISIT, 2013. on Signals, Systems and Computers, Nov 2016, pp. 714–718. 18

[35] ——, “On remote estimation with communication scheduling and power conjunction with the IEEE INFOCOM 2018 and 2019. The papers he co- allocation,” in IEEE 55th Conference on Decision and Control (CDC), authored received the best student paper award at IEEE/IFIP WiOpt 2013 and Dec 2016, pp. 5900–5905. the best paper award at IEEE/IFIP WiOpt 2019. [36] J. Chakravorty and A. Mahajan, “Remote-state estimation with packet drop,” 6th IFAC Workshop on Distributed Estimation and Control in Networked Systems, 2016. [37] ——, “Structure of optimal strategies for remote estimation over Gilbert- Elliott channel with feedback,” in IEEE ISIT, 2017. [38] ——, “Remote estimation over a packet-drop channel with Markovian state,” CoRR, abs/1807.09706, 2018. [39] T. Berger, “Information rates of Wiener processes,” IEEE Trans. Inf. Theory, vol. 16, no. 2, pp. 134–139, March 1970. [40] A. Kipnis, A. Goldsmith, and Y. Eldar, “The distortion- of sampled Wiener processes,” 2016, https://arxiv.org/abs/1608.04679. [41] H. V. Poor, An Introduction to Signal Detection and Estimation, 2nd ed. New York, NY, USA: Springer-Verlag New York, Inc., 1994. [42] P. J. Haas, Stochastic Petri Nets: Modelling, Stability, Simulation. New York, NY: Springer New York, 2002. [43] S. M. Ross, Applies Probability Models with Optimization Applications. San Francisco, CA: Holden-Day, 1970. Yury Polyanskiy (S’08-M’10-SM’14) is an Associate Professor of Electrical [44] H. Mine and S. Osaki, Markovian Decision Processes. New York: Engineering and Computer Science and a member of the Laboratory for Elsevier, 1970. Information and Decision Systems at the Massachusetts Institute of Tech- [45] D. Hayman and M. Sobel, Stochastic models in Operations Research, nology, Cambridge, MA, USA. He received the B.S. and M.S. degrees Volume II: Stochastic Optimizations. New York: McGraw-Hill, 1984. in applied and physics from the Moscow Institute of Physics [46] A. Arapostathis, V. S. Borkar, E. Fernndez-Gaucherand, M. K. Ghosh, and Technology, Moscow, Russia, in 2003 and 2005, respectively, and the and S. I. Marcus, “Discrete-time controlled Markov processes with Ph.D. degree in electrical engineering from Princeton University, Princeton, average cost criterion: A survey,” SIAM Journal on Control and Op- NJ, USA, in 2010. Currently, his research focuses on basic questions in timization, vol. 31, no. 2, pp. 282–344, 1993. information theory, error-correcting codes, wireless communication, and fault- [47] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. tolerant computation. Dr. Polyanskiy won the 2013 NSF CAREER award and Belmont, MA: Athena Scientific, 2005, vol. 1. the 2011 IEEE Information Theory Society Paper Award. [48] H. Nyquist, “Certain topics in telegraph transmission theory,” Transac- tions of the American Institute of Electrical Engineers, vol. 47, no. 2, pp. 617–644, April 1928. [49] C. E. Shannon, “Communication in the presence of noise,” Proceedings of the IRE, vol. 37, no. 1, pp. 10–21, Jan 1949. [50] A. N. Shiryaev, Optimal Stopping Rules. New York: Springer-Verlag, 1978. [51] S. M. Ross, Stochastic Processes, 2nd ed. John Wiley& Sons, 1996. [52] W. Dinkelbach, “On nonlinear fractional programming,” Management Science, vol. 13, no. 7, pp. 492–498, 1967. [53] D. P. Bertsekas, A. Nedi´c, and A. E. Ozdaglar, Convex Analysis and Optimization. Belmont, MA: Athena Scientific, 2003. [54] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univerisity Press, 2004. [55] P. Morters and Y. Peres, . Cambridge Univerisity Press, 2010. [56] R. Durrett, Probability: Theory and Examples, 4th ed. Cambridge Elif Uysal (S’95-M’03-SM’13) is a Professor in the Department of Electrical Univerisity Press, 2010. and Electronics Engineering at the Middle East Technical University (METU), [57] L. Kleinrock, Queueing Systems. New York: John Wiley & Sons, 1976. in Ankara, Turkey. She received the Ph.D. degree in EE from Stanford [58] L. Evans, S. Keef, and J. Okunev, “Modelling real interest rates,” Journal University in 2003, the S.M. degree in EECS from the Massachusetts Institute of Banking and Finance, vol. 18, no. 1, pp. 153 – 165, 1994. of Technology (MIT) in 1999 and the B.S. degree from METU in 1997. [59] A. Cika, M.-A. Badiu, and J. P. Coon, “Quantifying link stability in ad From 2003-05 she was a lecturer at MIT, and from 2005-06 she was an hoc wireless networks subject to Ornstein-Uhlenbeck mobility,” 2018, Assistant Professor at the Ohio State University (OSU). Since 2006, she coRR, abs/1811.03410. has been with METU, and held visiting positions at OSU and MIT during [60] H. Kim, J. Park, M. Bennis, and S. Kim, “Massive UAV-to-ground com- 2014-2016. Her research interests are at the junction of communication and munication and its stable movement control: A mean-field approach,” in networking theories, with particular application to energy-efficient wireless IEEE SPAWC, June 2018, pp. 1–5. networking. Dr. Uysal is a recipient of the 2014 Young Scientist Award [61] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA: from the Science Academy of Turkey, an IBM Faculty Award (2010), Athena Scientific, 1999. the Turkish National Science Foundation Career Award (2006), an NSF [62] P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identifica- Foundations on Communication research grant (2006-2010), the MIT Vinton tion, and Adaptive Control. Englewood Cliffs, NY: Prentice-Hall, Inc., Hayes Fellowship, and the Stanford Graduate Fellowship. She is an editor for 1986. the IEEE/ACM Transactions on Networking, and served as associate editor for the IEEE Transactions on Wireless Communication (2014-2018) and track 3 co-chair for IEEE PIMRC 2019. Her current research is supported by TUBITAK, Huawei, and Turk Telekom.

Yin Sun (S’08-M’11) is an assistant professor in the Department of Electrical and Computer Engineering at Auburn University, Alabama. He received his B.Eng. and Ph.D. degrees in from Tsinghua University, in 2006 and 2011, respectively. He was a postdoctoral scholar and research associate at the Ohio State University during 2011-2017. His research interests include wireless communications, communication networks, information freshness, information theory, and . He is the founding co-chair of the first and second Age of Information Workshops, in