Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning Jayakumar Subramanian and Aditya Mahajan

Total Page:16

File Type:pdf, Size:1020Kb

Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning Jayakumar Subramanian and Aditya Mahajan 1 Renewal Monte Carlo: Renewal theory based reinforcement learning Jayakumar Subramanian and Aditya Mahajan Abstract—In this paper, we present an online reinforcement discounted and average reward setups as well as for models learning algorithm, called Renewal Monte Carlo (RMC), for with continuous state and action spaces. However, they suffer infinite horizon Markov decision processes with a designated from various drawbacks. First, they have a high variance start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, because a single sample path is used to estimate performance. and ease of implementation while, at the same time, circumvents Second, they are not asymptotically optimal for infinite horizon their key drawbacks of high variance and delayed (end of models because it is effectively assumed that the model is episode) updates. The key ideas behind RMC are as follows. episodic; in infinite horizon models, the trajectory is arbitrarily First, under any reasonable policy, the reward process is ergodic. truncated to treat the model as an episodic model. Third, the So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected policy improvement step cannot be carried out in tandem discounted time over a regenerative cycle. Second, by carefully with policy evaluation. One must wait until the end of the examining the expression for performance gradient, we propose a episode to estimate the performance and only then can the stochastic approximation algorithm that only requires estimates policy parameters be updated. It is for these reasons that of the expected discounted reward and discounted time over a Monte Carlo methods are largely ignored in the literature on regenerative cycle and their gradients. We propose two unbiased estimators for evaluating performance gradients—a likelihood policy gradient methods, which almost exclusively focuses on ratio based estimator and a simultaneous perturbation based temporal difference methods such as actor-critic with eligibility estimator—and show that for both estimators, RMC converges traces [3]. to a locally optimal policy. We generalize the RMC algorithm to In this paper, we propose a Monte Carlo method—which we post-decision state models and also present a variant that con- call Renewal Monte Carlo (RMC)—for infinite horizon Markov verges faster to an approximately optimal policy. We conclude by presenting numerical experiments on a randomly generated MDP, decision processes with designated start state. Like Monte Carlo, event-triggered communication, and inventory management. RMC has low bias, is simple and easy to implement, and works for models with continuous state and action spaces. At the Index Terms—Reinforcement learning, Markov decision pro- cesses, renewal theory, Monte Carlo methods, policy gradient, same time, it does not suffer from the drawbacks of typical stochastic approximation Monte Carlo methods. RMC is a low-variance online algorithm that works for infinite horizon discounted and average reward setups. One doesn’t have to wait until the end of the episode I. INTRODUCTION to carry out the policy improvement step; it can be carried out In recent years, reinforcement learning [1]–[4] has emerged whenever the system visits the start state (or a neighborhood as a leading framework to learn how to act optimally in of it). unknown environments. Policy gradient methods [5]–[10] Although renewal theory is commonly used to estimate have played a prominent role in the success of reinforcement performance of stochastic systems in the simulation optimiza- learning. Such methods have two critical components: policy tion community [11], [12], those methods assume that the arXiv:1804.01116v1 [cs.LG] 3 Apr 2018 evaluation and policy improvement. In policy evaluation step, probability law of the primitive random variables and its weak the performance of a parameterized policy is evaluated while in derivate are known, which is not the case in reinforcement the policy improvement step, the policy parameters are updated learning. Renewal theory is also commonly used in the using stochastic gradient ascent. engineering literature on queuing theory and systems and Policy gradient methods may be broadly classified as Monte control for Markov decision processes (MDPs) with average Carlo methods and temporal difference methods. In Monte reward criteria and a known system model. There is some prior Carlo methods, performance of a policy is estimated using work on using renewal theory for reinforcement learning [13], the discounted return of a single sample path; in temporal [14], where renewal theory based estimators for the average difference methods, the value(-action) function is guessed and return and differential value function for average reward MDPs this guess is iteratively improved using temporal differences. is developed. In RMC, renewal theory is used in a different Monte Carlo methods are attractive because they have zero manner for discounted reward MDPs (and the results generalize bias, are simple and easy to implement, and work for both to average cost MDPs). This work was supported by the Natural Sciences and Engineering Research II. RMC ALGORITHM Council of Canada through NSERC Discovery Accelerator RGPAS 493011-16. Consider a Markov decision process (MDP) with state St The authors are with the Electrical and Computer Engineering Depart- 2 and action At . The system starts in an initial state ment, McGill University, Montreal, QC H3A 0E9, Canada. (e-mails: jayaku- S 2 A [email protected], [email protected]) s0 and at time t: 2 S 2 (n) −τ (n−1) 1) there is a controlled transition from St to St+1 according where Γ = γ . By the strong Markov property, (n) (n) to a transition kernel P (At); R n≥1 and T n≥1 are i.i.d. sequences. Let Rθ and f g (nf) g (n) 2) a per-step reward Rt = r(St;At;St+1) is received. Tθ denote E[R ] and E[T ], respectively. Define Future is discounted at a rate γ (0; 1). 2 N N A (time-homogeneous and Markov) policy π maps the 1 X (n) 1 X (n) Rb = R and Tb = T ; (5) current state to a distribution on actions, i.e., At π(St). N N ∼ n=1 n=1 We use π(a s) to denote P(At = a St = s). The performance j j of a policy π is given by where N is a large number. Then, Rb and Tb are unbiased and 1 asymptotically consistent estimators of Rθ and Tθ. X t From ideas similar to standard Renewal theory [17], we have Jπ = EAt∼π(St) γ Rt S0 = s0 : (1) t=0 the following. We are interested in identifying an optimal policy, i.e., a Proposition 1 (Renewal Relationship) The performance of policy that maximizes the performance. When and are policy πθ is given by: Borel spaces, we assume that the model satisfiesS the standardA conditions under which time-homogeneous Markov policies Rθ Jθ = : (6) are optimal [15]. In the sequel, we present a sample path based (1 γ)Tθ − online learning algorithm, which we call Renewal Monte Carlo (RMC), which identifies a locally optimal policy within the PROOF For ease of notation, define class of parameterized policies. Suppose policies are parameterized by a closed and convex τ (n)−τ (n−1) Tθ = EAt∼πθ (St) γ subset Θ of the Euclidean space. For example, Θ could be the weight vector in a Gibbs soft-max policy, or the weights Using the formula for geometric series, we get that Tθ = of a deep neural network, or the thresholds in a control limit (1 Tθ)=(1 γ). Hence, − − policy, and so on. Given θ Θ, we use πθ to denote the policy 2 parameterized by θ and J to denote J . We assume that for Tθ = 1 (1 γ)Tθ: (7) θ πθ − − all policies πθ, θ Θ, the designated start state s0 is positive recurrent. 2 Now, consider the performance: The typical approach for policy gradient based reinforcement τ (1)−1 learning is to start with an initial guess θ Θ and iteratively X 0 J = γtR update it using stochastic gradient ascent.2 In particular, let θ EAt∼πθ (St) t t=0 Jθm be an unbiased estimator of θJθ , then update 1 b θ=θm r r τ (1) X t−τ (1) + γ γ Rt S0 = s0 θm+1 = θm + αm b Jθm (2) (1) r Θ t=τ (a) τ (1)) where [θ]Θ denotes the projection of θ onto Θ and αm m≥1 = Rθ + EAt∼πθ (St)[γ ] Jθ is the sequence of learning rates that satisfies thef standardg = Rθ + TθJθ; (8) assumptions of 1 1 where the second expression in (a) uses the independence of X X α = and α2 < : (3) random variables from (0; τ (1) 1) to those from τ (1) onwards m m − m=1 1 m=1 1 due to the strong Markov property. Substituting (7) in (8) and Under mild technical conditions [16], the above iteration rearranging terms, we get the result of the proposition. ∗ converges to a θ that is locally optimal, i.e., θJθ ∗ = 0. Differentiating both sides of Equation (6) with respect to θ, r θ=θ In RMC, we approximate θJθ by a Renewal theory based we get that estimator as explained below.r Let τ (n) denote the stopping time when the system returns Hθ θJθ = ; where Hθ = Tθ θRθ Rθ θTθ: (9) to the start state s for the n-th time. In particular, let τ (0) = 0 r T2(1 γ) r − r 0 θ − and for n 1 define ≥ Therefore, instead of using stochastic gradient ascent to (n) (n−1) τ = inf t > τ : st = s0 : find the maximum of Jθ, we can use stochastic approximation f g to find the root of Hθ.
Recommended publications
  • Poisson Processes Stochastic Processes
    Poisson Processes Stochastic Processes UC3M Feb. 2012 Exponential random variables A random variable T has exponential distribution with rate λ > 0 if its probability density function can been written as −λt f (t) = λe 1(0;+1)(t) We summarize the above by T ∼ exp(λ): The cumulative distribution function of a exponential random variable is −λt F (t) = P(T ≤ t) = 1 − e 1(0;+1)(t) And the tail, expectation and variance are P(T > t) = e−λt ; E[T ] = λ−1; and Var(T ) = E[T ] = λ−2 The exponential random variable has the lack of memory property P(T > t + sjT > t) = P(T > s) Exponencial races In what follows, T1;:::; Tn are independent r.v., with Ti ∼ exp(λi ). P1: min(T1;:::; Tn) ∼ exp(λ1 + ··· + λn) . P2 λ1 P(T1 < T2) = λ1 + λ2 P3: λi P(Ti = min(T1;:::; Tn)) = λ1 + ··· + λn P4: If λi = λ and Sn = T1 + ··· + Tn ∼ Γ(n; λ). That is, Sn has probability density function (λs)n−1 f (s) = λe−λs 1 (s) Sn (n − 1)! (0;+1) The Poisson Process as a renewal process Let T1; T2;::: be a sequence of i.i.d. nonnegative r.v. (interarrival times). Define the arrival times Sn = T1 + ··· + Tn if n ≥ 1 and S0 = 0: The process N(t) = maxfn : Sn ≤ tg; is called Renewal Process. If the common distribution of the times is the exponential distribution with rate λ then process is called Poisson Process of with rate λ. Lemma. N(t) ∼ Poisson(λt) and N(t + s) − N(s); t ≥ 0; is a Poisson process independent of N(s); t ≥ 0 The Poisson Process as a L´evy Process A stochastic process fX (t); t ≥ 0g is a L´evyProcess if it verifies the following properties: 1.
    [Show full text]
  • The Maximum of a Random Walk Reflected at a General Barrier
    The Annals of Applied Probability 2006, Vol. 16, No. 1, 15–29 DOI: 10.1214/105051605000000610 c Institute of Mathematical Statistics, 2006 THE MAXIMUM OF A RANDOM WALK REFLECTED AT A GENERAL BARRIER By Niels Richard Hansen University of Copenhagen We define the reflection of a random walk at a general barrier and derive, in case the increments are light tailed and have negative mean, a necessary and sufficient criterion for the global maximum of the reflected process to be finite a.s. If it is finite a.s., we show that the tail of the distribution of the global maximum decays exponentially fast and derive the precise rate of decay. Finally, we discuss an example from structural biology that motivated the interest in the reflection at a general barrier. 1. Introduction. The reflection of a random walk at zero is a well-studied process with several applications. We mention the interpretation from queue- ing theory—for a suitably defined random walk—as the waiting time until service for a customer at the time of arrival; see, for example, [1]. Another important application arises in molecular biology in the context of local com- parison of two finite sequences. To evaluate the significance of the findings from such a comparison, one needs to study the distribution of the locally highest scoring segment from two independent i.i.d. sequences, as shown in [8], which equals the distribution of the maximum of a random walk reflected at zero. The global maximum of a random walk with negative drift and, in par- ticular, the probability that the maximum exceeds a high value have also been studied in details.
    [Show full text]
  • Renewal Theory for Uniform Random Variables
    California State University, San Bernardino CSUSB ScholarWorks Theses Digitization Project John M. Pfau Library 2002 Renewal theory for uniform random variables Steven Robert Spencer Follow this and additional works at: https://scholarworks.lib.csusb.edu/etd-project Part of the Mathematics Commons Recommended Citation Spencer, Steven Robert, "Renewal theory for uniform random variables" (2002). Theses Digitization Project. 2248. https://scholarworks.lib.csusb.edu/etd-project/2248 This Thesis is brought to you for free and open access by the John M. Pfau Library at CSUSB ScholarWorks. It has been accepted for inclusion in Theses Digitization Project by an authorized administrator of CSUSB ScholarWorks. For more information, please contact [email protected]. RENEWAL THEORY FOR UNIFORM RANDOM VARIABLES A Thesis Presented to the Faculty of California State University, San Bernardino In Partial Fulfillment of the Requirements for the Degree Master of Arts in Mathematics by Steven Robert Spencer March 2002 RENEWAL THEORY FOR UNIFORM RANDOM VARIABLES A Thesis Presented to the Faculty of California State University, San Bernardino by Steven Robert Spencer March 2002 Approved by: Charles Stanton, Advisor, Mathematics Date Yuichiro Kakihara Terry Hallett j. Peter Williams, Chair Terry Hallett, Department of Mathematics Graduate Coordinator Department of Mathematics ABSTRACT The thesis answers the question, "How many times must you change a light-bulb in a month if the life-time- of any one light-bulb is anywhere from zero to one month in length?" This involves uniform random variables on the - interval [0,1] which must be summed to give expected values for the problem. The results of convolution calculations for both the uniform and exponential distributions of random variables give expected values that are in accordance with the Elementary Renewal Theorem and renewal function.
    [Show full text]
  • Central Limit Theorem for Supercritical Binary Homogeneous Crump-Mode
    Central limit theorem for supercritical binary homogeneous Crump-Mode-Jagers processes Benoit Henry1,2 Abstract We consider a supercritical general branching population where the lifetimes of individuals are i.i.d. with arbitrary distribution and each individual gives birth to new individuals at Poisson times independently from each others. The population counting process of such population is a known as binary homogeneous Crump-Jargers-Mode process. It is known that such processes converges almost surely when correctly renormalized. In this paper, we study the error of this convergence. To this end, we use classical renewal theory and recent works [17, 6, 5] on this model to obtain the moments of the error. Then, we can precisely study the asymptotic behaviour of these moments thanks to L´evy processes theory. These results in conjunction with a new decomposition of the splitting trees allow us to obtain a central limit theorem. MSC 2000 subject classifications: Primary 60J80; secondary 92D10, 60J85, 60G51, 60K15, 60F05. Key words and phrases. branching process – splitting tree – Crump–Mode–Jagers process – linear birth–death process – L´evy processes – scale function – Central Limit Theorem. 1 Introduction In this work, we consider a general branching population where individuals live and reproduce inde- pendently from each other. Their lifetimes follow an arbitrary distribution PV and the births occur at Poisson times with constant rate b. The genealogical tree induced by this population is called a arXiv:1509.06583v2 [math.PR] 18 Nov 2016 splitting tree [11, 10, 17] and is of main importance in the study of the model. The population counting process Nt (giving the number of living individuals at time t) is a binary homogeneous Crump-Mode-Jagers (CMJ) process.
    [Show full text]
  • Applications of Renewal Theory to Pattern Analysis
    Rochester Institute of Technology RIT Scholar Works Theses 4-18-2016 Applications of Renewal Theory to Pattern Analysis Hallie L. Kleiner [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Kleiner, Hallie L., "Applications of Renewal Theory to Pattern Analysis" (2016). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. APPLICATIONS OF RENEWAL THEORY TO PATTERN ANALYSIS by Hallie L. Kleiner A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Applied Mathematics School of Mathematical Sciences, College of Science Rochester Institute of Technology Rochester, NY April 18, 2016 Committee Approval: Dr. James Marengo Date School of Mathematical Sciences Thesis Advisor Dr. Bernard Brooks Date School of Mathematical Sciences Committee Member Dr. Manuel Lopez Date School of Mathematical Sciences Committee Member Dr. Elizabeth Cherry Date School of Mathematical Sciences Director of Graduate Programs RENEWAL THEORY ON PATTERN ANALYSIS Abstract In this thesis, renewal theory is used to analyze patterns of outcomes for discrete random variables. We start by introducing the concept of a renewal process and by giving some examples. It is then shown that we may obtain the distribution of the number of renewals by time t and compute its expectation, the renewal function. We then proceed to state and illustrate the basic limit theorems for renewal processes and make use of Wald’s equation to give a proof of the Elementary Renewal Theorem.
    [Show full text]
  • Regenerative Random Permutations of Integers
    The Annals of Probability 2019, Vol. 47, No. 3, 1378–1416 https://doi.org/10.1214/18-AOP1286 © Institute of Mathematical Statistics, 2019 REGENERATIVE RANDOM PERMUTATIONS OF INTEGERS BY JIM PITMAN AND WENPIN TANG University of California, Berkeley Motivated by recent studies of large Mallows(q) permutations, we pro- pose a class of random permutations of N+ and of Z, called regenerative per- mutations. Many previous results of the limiting Mallows(q) permutations are recovered and extended. Three special examples: blocked permutations, p-shifted permutations and p-biased permutations are studied. 1. Introduction and main results. Random permutations have been exten- sively studied in combinatorics and probability theory. They have a variety of ap- plications including: • statistical theory, for example, Fisher–Pitman permutation test [36, 81], ranked data analysis [23, 24]; • population genetics, for example, Ewens’ sampling formula [32] for the distri- bution of allele frequencies in a population with neutral selection; • quantum physics, for example, spatial random permutations [12, 104]arising from the Feynman representation of interacting Bose gas; • computer science, for example, data streaming algorithms [53, 79], interleaver designs for channel coding [11, 27]. Interesting mathematical problems are (i) understanding the asymptotic behav- ior of large random permutations, and (ii) generating a sequence of consistent random permutations. Over the past few decades, considerable progress has been made in these two directions: (i) Shepp and Lloyd [97], Vershik and Shmidt [105, 106] studied the distri- bution of cycles in a large uniform random permutation. The study was extended by Diaconis, McGrath and Pitman [25], Lalley [72] for a class of large nonuni- form permutations.
    [Show full text]
  • Nonlinear Renewal Theory for Markov Random Walks
    stochastic processes and their applications Nonlinear renewal theory for Markov random walks Vincent F. Melfi Department of Statisticsand Probability, Michigan Stare University, East Lansing. MI 48824.1027, USA Received 27 November 1991; revised 11 January 1994 Abstract Let {S,} be a Markov random walk satisfying the conditions of Kesten’s Markov renewal theorem. It is shown that if {Z”} is a stochastic process whose finite-dimensional, conditional distributions are asymptotically close to those of {S,} (in the sense of weak convergence), then the overshoot of {Z,,} has the same limiting distribution as that of IS.}. In the case where {Z,} can be represented as a perturbed Markov random walk, this allows substantial weakening of the slow change condition on the perturbation process; more importantly, no such representation is required. An application to machine breakdown times is given. Keywords: Markov random walk; Nonlinear renewal theory; Prokhorov metric; Markov renewal theory 1. Introduction Let SO,S1,... be a stochastic process for which renewal theorem is known, i.e., for which it is known that the overshoot {S,. - a: a 2 0} converges in distribution (to a known limiting distribution) as a + co. Here z, = inf{n 2 1: S, > a}. In many applications, especially in statistics, what is needed is a renewal theorem for a process ZO, Z1, . which is asymptotically close to {S,} in some sense. This has spurred the development of such renewal theorems, usually called nonlinear renewal theorems, during the past 15 years. (The adjective “nonlinear” is used because such theorems may be used to obtain the limiting distribution of the overshoot of the original process {S,} over a nonlinear boundary.) When S, is the nth partial sum of i.i.d.
    [Show full text]
  • A Markov Renewal Approach to M/G/1 Type Queues with Countably Many Background States∗
    A Markov renewal approach to M/G/1 type queues with countably many background states∗ Masakiyo Miyazawa Tokyo University of Science Abstract We consider the stationary distribution of the M/GI/1 type queue when back- ground states are countable. We are interested in its tail behavior. To this end, we derive a Markov renewal equation for characterizing the stationary distribution using a Markov additive process that describes the number of customers in sys- tem when the system is not empty. Variants of this Markov renewal equation are also derived. It is shown that the transition kernels of these renewal equations can be expressed by the ladder height and the associated background state of a dual Markov additive process. Usually, matrix analysis is extensively used for studying the M/G/1 type queue. However, this may not be convenient when the background states are countable. We here rely on stochastic arguments, which not only make computations possible but also reveal new features. Those results are applied to study the tail decay rates of the stationary distributions. This includes refinements of the existence results with extensions. 1. Introduction The M/GI/1 type queue, which was termed by Neuts [13], has been extensively studied in theory as well as in applications. It is a discrete-time single server queue with a finite number of background states, and a pair of the number of customers in system and the background state constitutes a discrete-time Markov chain. It is assumed that background state transitions do not depend on the number of customers in system as long as the system is not empty, but this is not the case when the system is empty.
    [Show full text]
  • Branching Processes and Population Dynamics
    1863-5 Advanced School and Conference on Statistics and Applied Probability in Life Sciences 24 September - 12 October, 2007 Branching Processes and Population Dynamics Peter Jagers Chalmers University of Technology Mathematical Sciences Gothenburg Sweden BranchingBranching ProcessesProcesses andand PopulationPopulation DynamicsDynamics PeterPeter JagersJagers TriesteTrieste 20072007 1. Introduction Generalities about Branching and Populations What is a Branching Process? • Mathematically, a random rooted tree (or forest, usually with nodes branching independently and often even i.i.d. • Historically, – born in a demographic and biological context, the extinction of family names, Galton, Fisher, Haldane (1850 – 1930); – maturing in nuclear physics: the cold war (Harris and Sevastyanov) (1945 – 1965); – turning into pure mathematics (Russian school, Dawson, Dynkin, Aldous....) • But also finding use in computer science – and population biology again! InIn aa bookbook storestore nearnear you:you: More Mathematical Books • Harris, T. E., The Theory of Branching Processes (1963, recent reprint) • Sevastyanov, B. A., Vetvyashchiesya protsessy (1971 – also in German: Verzweigungsprozesse) • Mode, C. J., Multitype Branching Processes (1971) • Athreya, K. B. and Ney, P., Branching Processes (1972) • Jagers, P., Branching Processes with Biological Applications (1975) • Asmussen, S. and Hering, H., Branching Processes (1983) • Guttorp, P., Statistical Inference for Branching Processes (1991) • Athreya, K. B. and Jagers, P. (eds,), Classical and Modern Branching Processes (1997) • Lyons, R. and Peres, Y. Probability on Trees and Networks. Under preparation. Manuscript downloadable from http://mypage.iu.edu/~rdlyons/prbtree/prbtree.html • Biological books: – Taib, Z. , Branching Processes and Neutral Evolution (1992) – Kimmel, M. and Axelrod, D. E. , Branching Processes in Biology (2002) WhatWhat isis aa population?population? •• Originally, a group of humans = people.
    [Show full text]
  • Some Facts from Probability Theory
    ApPENDIX A Some Facts from Probability Theory 1. Convergence of Moments. Uniform Integrability In all probability courses we are told that convergence in probability and convergence in distribution do not imply that moments converge (even if they exist). EXAMPLE 1.1. The standard example is {Xn' n ~ 1} defined by 1 1 P{Xn = O} = 1 - - and P{Xn = n} =-. (1.1) n n Then Xn ~ 0 as n ---. 00, but, for example, EXn ---.1 and Var Xn ---. 00 as n ---. 00, that is, the expected value converges, but not to the expected value of the limiting random variable, and the variance diverges. The reason for this behavior is that the distribution mass escapes to infinity in a forbidden way. The adequate concept in this context is the notion of uniform integrability. A sequence of random variables, {Xn' n ~ 1}, is said to be uniformly integrable if lim EIXnII{IXnl > IX} = 0 uniformly in n. (1.2) It is now easy to see that the sequence defined by (1.1) is not uniformly integrable. Another way to check uniform integrability is given by the following criterion (see e.g. Chung (1974), Theorem 4.5.3). Lemma 1.1. A sequence {Y", n ~ 1} is uniformly integrable iff (i) sup EI Y"I < 00. n 166 Appendix A. Some Facts from Probability Theory (ii) For every & > 0 there exists ~ > 0, such that for all events A with P {A} < ~ we have ElY" I I {A} < & for all n. (1.3) The following is an important result connecting uniform integrability and moment convergence.
    [Show full text]
  • Aging Renewal Theory and Application to Random Walks
    PHYSICAL REVIEW X 4, 011028 (2014) Aging Renewal Theory and Application to Random Walks Johannes H. P. Schulz,1 Eli Barkai,2 and Ralf Metzler3,4,* 1Physics Department T30g, Technical University of Munich, 85747 Garching, Germany 2Department of Physics, Bar Ilan University, Ramat-Gan 52900, Israel 3Institute for Physics and Astronomy, University of Potsdam, 14476 Potsdam-Golm, Germany 4Physics Department, Tampere University of Technology, FI-33101 Tampere, Finland (Received 3 October 2013; revised manuscript received 8 December 2013; published 27 February 2014) We discuss a renewal process in which successive events are separated by scale-free waiting time periods. Among other ubiquitous long-time properties, this process exhibits aging: events counted initially in a time interval ½0;t statistically strongly differ from those observed at later times ½ta;ta þ t. The versatility of renewal theory is owed to its abstract formulation. Renewals can be interpreted as steps of a random walk, switching events in two-state models, domain crossings of a random motion, etc. In complex, disordered media, processes with scale-free waiting times play a particularly prominent role. We set up a unified analytical foundation for such anomalous dynamics by discussing in detail the distribution of the aging renewal process. We analyze its half-discrete, half-continuous nature and study its aging time evolution. These results are readily used to discuss a scale-free anomalous diffusion process, the continuous-time random walk. By this, we not only shed light on the profound origins of its characteristic features, such as weak ergodicity breaking, along the way, we also add an extended discussion on aging effects.
    [Show full text]
  • Dynamic Chinese Restaurant Game in Cognitive Radio Networks
    2013 Proceedings IEEE INFOCOM Dynamic Chinese Restaurant Game in Cognitive Radio Networks Chunxiao Jiang∗†, Yan Chen∗, Yu-Han Yang∗, Chih-Yu Wang∗‡, and K. J. Ray Liu∗ ∗Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA †Department of Electronic Engineering, Tsinghua University, Beijing 100084, P. R. China ‡Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan E-mail:{jcx, yan, yhyang}@umd.edu, [email protected], [email protected] Abstract—In a cognitive radio network with mobility, sec- is known as negative network externality [5], i.e., the negative ondary users can arrive at and leave the primary users’ licensed influence of other users’ behaviors on one user’s reward, due networks at any time. After arrival, secondary users are con- to which users tend to avoid making same decisions with fronted with channel access under the uncertain primary channel state. On one hand, they have to estimate the channel state, others to maximize their own payoffs. However, traditional i.e., the primary users’ activities, through performing spectrum cooperative sensing schemes simply combine all SUs’ sensing sensing and learning from other secondary users’ sensing results. results while ignoring the structure of sequential decision On the other hand, they need to predict subsequent secondary making [4], especially in a dynamic scenario where the users’ access decisions to avoid competition when accessing the primary channel state is time-varying and SUs arrive and leave ”spectrum hole”. In this paper, we propose a Dynamic Chinese Restaurant Game to study such a learning and decision making stochastically. Moreover, the negative network externality has problem in cognitive radio networks.
    [Show full text]