2013 Proceedings IEEE INFOCOM
Dynamic Chinese Restaurant Game in Cognitive Radio Networks
Chunxiao Jiang∗†, Yan Chen∗, Yu-Han Yang∗, Chih-Yu Wang∗‡, and K. J. Ray Liu∗
∗Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA †Department of Electronic Engineering, Tsinghua University, Beijing 100084, P. R. China ‡Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan E-mail:{jcx, yan, yhyang}@umd.edu, [email protected], [email protected]
Abstract—In a cognitive radio network with mobility, sec- is known as negative network externality [5], i.e., the negative ondary users can arrive at and leave the primary users’ licensed influence of other users’ behaviors on one user’s reward, due networks at any time. After arrival, secondary users are con- to which users tend to avoid making same decisions with fronted with channel access under the uncertain primary channel state. On one hand, they have to estimate the channel state, others to maximize their own payoffs. However, traditional i.e., the primary users’ activities, through performing spectrum cooperative sensing schemes simply combine all SUs’ sensing sensing and learning from other secondary users’ sensing results. results while ignoring the structure of sequential decision On the other hand, they need to predict subsequent secondary making [4], especially in a dynamic scenario where the users’ access decisions to avoid competition when accessing the primary channel state is time-varying and SUs arrive and leave ”spectrum hole”. In this paper, we propose a Dynamic Chinese Restaurant Game to study such a learning and decision making stochastically. Moreover, the negative network externality has problem in cognitive radio networks. We introduce a Bayesian not been considered in the previous channel access methods learning based method for secondary users to learn the channel [6]. Therefore, how SUs learn the uncertain primary channel state and propose a Multi-dimensional Markov Decision Process state sequentially and make best channel selections by taking based approach for secondary users to make optimal channel into account the negative network externality are challenging access decisions. Finally, we conduct simulations to verify the effectiveness and efficiency of the proposed scheme. issues in cognitive radio networks. Index Terms—Chinese Restaurant Game, Bayesian Learning, In our previous work [7], we proposed a new game, called Markov Decision Process, Cognitive Radio, Game Theory. Chinese Restaurant Game, to study how users in a social network make decisions when being confronted with uncertain I. INTRODUCTION network state [8][9]. Such a Chinese Restaurant Game is With the emerging of various wireless applications, avail- originated from the well-known Chinese Restaurant Process able electromagnetic radio spectrums are becoming more [10], which is widely adopted in non-parametric Bayesian and more crowded. The traditional static spectrum allocation statistics in machine learning to model the distribution which policy results in a large portion of the assigned spectrum is not parametric but stochastic. In Chinese Restaurant Game, being under utilized [1]. Recently, dynamic spectrum access there are finite tables with different sizes and finite customers in cognitive radio networks has shown great potential to sequentially requesting tables for meal. Since customers do improve the spectrum utilization efficiency. In a cognitive radio not know the exact size of each table, they have to learn the network, Secondary Users (SUs) can opportunistically utilize table sizes according to some external information. Moreover, the Primary User’s (PU’s) licensed spectrum bands without when requesting one table, each customer should consider harmful interference to the PU [2]. the subsequent customers’ decisions due to the limited dining Two essential issues of dynamic spectrum access are spec- space in each table, i.e., the negative network externality. In trum sensing and channel access. Since SUs are uncertain [11] and [12], the applications of Chinese Restaurant Game in about the primary channel state, i.e., whether the PU is absent, various research fields are also discussed. they need to estimate the channel state through performing The channel sensing and access problems in cognitive radio spectrum sensing [3]. In order to counter the channel fad- networks can be ideally modeled as a Chinese Restaurant ing and shadowing problem, cooperative spectrum sensing Game, where the tables are the primary channels, and cus- technology was proposed recently, in which SUs share their tomers are SUs who are seeking vacant channels. Moreover, spectrum sensing results to improve the sensing performance how a SU learns the PU’s activities can be regarded as how a [4]. After channel sensing, SUs access one vacant prima- customer learns the restaurant state, and how a SU chooses a ry channel for data transmission. When choosing a vacant channel to access can be formulated as how a customer selects channel, each SU should not only consider the immediate a table. One assumption in the Chinese Restaurant Game [7] utility, but also take into account the utility in the future since is the fixed population setting, i.e., there is a finite number the more subsequent SUs access the same channel, the less of customers choosing the tables sequentially. However, in throughput can be obtained by each SU. Such a phenomenon cognitive radio networks, SUs may arrive and leave at any
978-1-4673-5946-7/13/$31.00 ©2013 IEEE 962 2013 Proceedings IEEE INFOCOM
2 time, which results in a dynamic population setting. In such &KDQQHO a case, a SU’s utility will change from time to time due to a &KDQQHO dynamic number of SUs in each channel. Considering these challenges, in this paper, we extend the &KDQQHO1 Chinese Restaurant Game in [7] to a dynamic population 3ULPDU\ setting and propose a Dynamic Chinese Restaurant Game 1HWZRUN for the cognitive radio networks. In this Dynamic Chinese Restaurant Game, we consider the scenario that SUs arrive 6HFRQGDU\ at and leave the primary network by Poisson process. Each 8VHUV new coming SU not only learns the channel state according to the information received and revealed by former SUs, but Fig. 1. System model of the cognitive radio network. also predicts the subsequent SUs’ decisions to maximize the utility. We introduce a channel state learning method based on Bayesian learning rule, where each SU constructs his/her channel i to access, his/her utility function can be given by j own belief on the channel state according to his/her own U θi(t),gi (t) , where θi(t) denotes the state of channel i at j signal and the former SU’s belief information. Moreover, we time t and gi (t) is the number of SUs choosing channel i formulate the channel access problem as a Multi-dimensional during the jth SU’s access time in channel i. Note that the j Markov Decision Process (M-MDP) and design a modified utility function is a decreasing function in terms of gi (t), value iteration algorithm to find the best strategies. We prove which can be regarded as the characteristic of negative network theoretically that there is a threshold structure in the optimal externality since the more subsequent SUs join channel i, the strategy profile for the two primary channel scenario. For less utility the jth SU can achieve. In the following analysis, multiple primary channel scenario, we propose a fast algorithm the time index (t) is omitted. with much lower computational complexity while achieving As discussed above, the channel state θ is changing with comparable performance. Finally, we conduct simulations to time. For new arriving SUs, they may not know the exact verify the effectiveness and efficiency of the proposed Dynam- state of each channel θi. Nevertheless, SUs can estimate the ic Chinese Restaurant Game theoretic scheme. state through channel sensing and former SU’s sensing result. The rest of this paper is organized as follows. Firstly, our Therefore, we assume that all SUs know the common prior system model is described in Section II. Then, we discuss how distribution of the state θi for each channel, which is denoted 0 0 0 SUs learn the primary channel state using Bayesian learning as b = {bi |bi = Pr(θi = H0), ∀i ∈ 1, 2, ..., N}. Moreover, j j rule in Section III. We introduce an M-MDP model to solve each SU can receive a signal s = {si , ∀i ∈ 1, 2, .., N} the channel access problem in Section VI and a modified value j through spectrum sensing, where si =1if the jth SU detects iteration algorithm in Section V. Finally, we show simulation j some activity on channel i and si =0if no activity is results in Section VI and draw conclusions in Section VII. detected on channel i. In such a case, the detection and II. SYSTEM MODEL false-alarm probability of channel i can be expressed as d f P = Pr(si =1|θi = H1) and P = Pr(si =1|θi = H0), A. Network Entity i i which are considered as common priors for all SUs. Fur- We consider a primary network with N independent primary thermore, we assume that there is a log-file in the server channels, and each channel can maximally host L users, as of the secondary network, which records each SU’s channel shown in Fig. 1. The PU has priority to occupy the channels belief and channel selection result. Through querying this log- at any time, while SUs are allowed to access the channel file, the new coming SU can obtain current grouping state under the condition that the PU’s communication QoS is information, i.e., the number of SUs in each channel, as well θ guaranteed [6]. We denote the primary channel state as = as the former SU’s belief on the channel state. {θ1,θ2, ..., θN } (all the subscripts mean the channel number index in the paper), where θi ∈{H0, H1} denotes the state B. ON-OFF Primary Channel Model of channel i, H0 means the PU is absent and H1 means the For the PU’s behavior in the primary channel, we model it PU is present. Note that the channel state θi(t) is time-varying as a general alternating ON-OFF renewal process, where the since the PU may appear at the primary channel at any time. ON state means the PU is present and the OFF state means For the secondary network, we assume SUs arrive and the PU is absent. This general ON-OFF switch model can be depart by Poisson process with rate λ and μ, respectively. applied in the scenario when SUs have no knowledge about
All SUs can independently perform spectrum sensing using the PU’s communication mechanism [13]. Let TON and TOFF energy detection [1]. For each SU, his/her action set is denote the length of the ON state and OFF state, respectively. A = {1, 2, ..., N}, i.e., choosing one channel from all N According to different types of the primary services (e.g., channels. Let us define the grouping state when the jth SU digital TV broadcasting or cellular communication), TON and Gj j j j arrives, =(g1,g2, ..., gN ) (all the superscripts mean the SU TOFF statistically satisfy different types of distributions. Here j ∈ index in the paper), where gi [0,L] stands for the number we assume that TON and TOFF are independent and satisfy ex- of SUs in channel i. Assuming that the jth SU finally chooses ponential distributions with parameter r1 and r0, respectively,
963 2013 Proceedings IEEE INFOCOM
3
denoted by fON(t) and fOFF(t) as follows: are assumed to be fully rational, they adopt Bayesian learning j j to update their beliefs b = {bi } by T ∼ f (t)= 1 e−t/r1 , ON ON r1 j j−1 j j (1) j Pr θi = H0|ˆbi Pr(si |θi = H0) ∼ 1 −t/r0 b = . (4) TOFF fOFF(t)= r0 e . i 1 j j−1 j j l=0 Pr θi = Hl|ˆbi Pr(si |θi = Hl) In such a case, the expected lengths of the ON state and As discussed in the system model, the channel state is vary- OFF state are r1 and r0 accordingly. These two parameters r1 ing with time. Here, we define the state transition probability and r0 can be effectively estimated by a maximum likelihood j j−1 as Pr(θi = H0|θi = H0), which represents the probability estimator [14]. Such an ON-OFF behavior of the PU is a that channel i is currently available when the jth SU arrives combination of two Poisson process, which is a renewal given the condition that channel i was available when the (j − j j−1 process [15]. The renewal interval is Tp = TON + TOFF and 1)th SU arrived. Similarly, we have Pr(θi = H1|θi = H0), the distribution of Tp, denoted by fp(t),is j j−1 j j−1 Pr(θi = H0|θi = H1) and Pr(θi = H1|θi = H1).In j j−1 ∗ such a case, the SU can calculate the items Pr θi = H0|ˆbi fp(t)=fON(t) fOFF(t). (2) j j−1 and Pr θi = H1|ˆbi in (4) as follows: III. BAYESIAN LEARNING FOR THE CHANNEL STATE j j−1 j j−1 j−1 Pr θi = H0|bi = Pr(θi = H0|θi = H0)bi + In this section, we discuss how a SU estimates the channel j j−1 j−1 Pr(θ = H0|θ = H1)(1 − b ),(5) state according to his/her own sensing results and the former i i i j H | j−1 j H | j−1 H j−1 SU’s beliefs. Here, we first introduce the concept of belief Pr θi = 1 bi = Pr(θi = 1 θi = 0)bi + j j−1 j−1 to describe the SU’s uncertainty about the channel state. The H1| H1 − j Pr(θi = θi = )(1 bi ).(6) belief bi denotes the jth SU’s belief on the state of channel j Since the primary channel is modeled as an ON-OFF i, θi . It is assumed that each SU reveals his/her beliefs after making the channel selection. In such a case, the jth SU’s process, the channel state transition probability depends on j−1 the time interval between the (j − 1)th and jth SUs’ arrival belief on channel i is learned from former SU’s belief bi , j time, tj. Note that the tj can be directly obtained from the his/her own signal si , which can be defined as log-file in the server. For notation simplicity, in the rest of this j j j j j−1 j j j j j b = {b |b = Pr(θ = H0|b ,s ), ∀i ∈ 1, 2, ...N}. (3) paper, we use P00(t ), P01(t ), P10(t ) and P11(t ) to denote i i i i i j j−1 j j−1 j Pr(θi = H0|θi = H0),Pr(θi = H1|θi = H0),Pr(θi = j ∈ j−1 j j−1 From the definition above, we can see that the belief bi [0, 1] H0|θi = H1) and Pr(θi = H1|θi = H1), respectively, j j j j is a continuous parameter. In a practical system, it is im- where P01(t )=1− P00(t ) and P11(t )=1− P10(t ). j possible for a SU to reveal his/her continuous belief using We can derive the close-form expression for P01(t ) using infinite data bits. Therefore, we quantize the continuous belief the renewal theory as follow [18] into M belief levels {B1, B2, ..., BM }, which means that if r +r j j j r1 − 0 1 tj k−1 k − r0r1 we have bi ∈ [ M , M ], then Bi = Bk. Since each SU can P01(t )= 1 e . (7) r0 + r1 only reveal and receive the quantized belief, the former SU’s j quantized belief Bj−1 is first mapped into the continuous Thus, we can have P00(t ) as j−1 j−1 b Bk 1 0 r0+r1 j belief according to the rule that if Bi = then j j r r − r r t ˆj−1 1 k−1 k P00(t )=1− P01(t )= + e 0 1 . (8) bi = 2 M + M . Note that the mapped continuous belief r0 + r1 r1 j−1 j−1 ˆbi here is not the former SU’s real continuous belief bi . j Similarly, the close-form expression for P11(t ) can also be bj−1 sj Then, is combined with the signal to calculate the obtained by solving the following renewal equation continuous belief bj. Finally, bj is quantized into the belief j t B . Since all primary channels are assumed to be independent, − P11(t)=r1fON(t)+ P11(t w)fp(w)dw, (9) the learning processes of these channels are also independent. 0 Thus, the learning process for the jth SU on channel i can be j where fON(t) is the probability density function (p.d.f ) of the j−1 C j−1 si j Q j summarized as Bi −→ ˆbi −→ bi −→ Bi . ON state’s length given in (1) and fp(t) is the p.d.f of the In the learning process, the most important step is how to PU’s renewal interval given in (2). j j calculate current belief bi according to current signal si and By solving (9), we can obtain the close-form expression for j−1 j 11 the former SU’s belief ˆbi , which is a classical social learning P (t ) given by r +r problem. Based on the approaches to belief formation, social j r0 r1 − 0 1 tj P11(t )= + e r0r1 . (10) learning can be classified as Bayesian learning [16] and non- r0 + r1 r0 Bayesian learning [17]. Bayesian learning refers that rational Then, we can have P10(ti) as individuals use Bayes’ rule to form the best estimation of the r +r j j r0 − 0 1 tj unknown parameters, such as the channel state in our model, P10(t )=1− P11(t )= 1 − e r0r1 . (11) while non-Bayesian learning requires individuals to follow r0 + r1 some predefined rules to update their beliefs, which inevitably By substituting (5-6), (7-8) and (10-11) into (4), we can j limits the rational users’ optimal decision making. Since SUs calculate the jth SU’s belief bi with the corresponding sensing
964 2013 Proceedings IEEE INFOCOM
4
j j j j j results si =1and si =0in (12-13) below. In the following, system state he/she encounters is S =(B , G ). In such a j j−1 j we denote (12) as bi | j = φ(ˆbi ,ti,si =1), and denote case, with multiple SUs arriving sequentially, the system states si =1 j j−1 j at different arrival time {S1,S2, ...Sj , ...} form a stochastic (13) as bi |sj =0 = φ(ˆbi ,ti,si =0)for simplicity. i process. In our learning rule, only the (j − 1)th SU’s belief IV. MULTI-DIMENSIONAL MARKOV DECISION PROCESS is used to update the jth SU’s belief. Therefore, Bj depends BASED CHANNEL ACCESS only on Bj−1. Moreover, since SUs arrive by Poisson process, j In this section, we investigate the channel access game by the grouping state G is also memoryless. In such a case, we 1 2 j modeling it as a Markov Decision Process (MDP) problem can verify that {S ,S , ...S , ...} is a Markov process. [19]. In this game, each SU selects a channel to access, B. Belief State Transitions with the objective of maximizing his/her own expected u- tility during the access time. To achieve this goal, rational Note that a SU’s belief transition is independent with his/her SUs not only need to consider the immediate utility, but action, and is only related to the channel state, as well as also need to consider the following SUs’ decisions. In our the Bayesian learning rule. Here, we define the belief state j j−1 model, SUs arrive by Poisson process and make the channel transition probability as P B |B . Since all channels are selection sequentially. When making the decision, one SU is independent with each other, we have Gj only confronted with current grouping information and N Bj j j−1 j j−1 belief information . In order to take into account the SU’s P B B = P Bi |Bi , (14) expected utility in the future, we use Bellman equation to i=1 formulate the SU’s utility and use MDP model to formulate j | j−1 this channel selection problem. In traditional MDP problem, where P Bi Bi is the belief state transition probability of × a player can adjust his/her decision when the system state channel i. In such a case, there is an M M belief state tran- changes. However, in our system, after choosing a certain sition matrix for each channel, which can be derived according j B | j−1 primary channel, a SU cannot adjust his/her channel selection to the Bayesian learning rule. To find P Bi = q Bi = B j−1 B even if the system state has already changed. Therefore, p , with the quantized belief Bi = p, we can calculate ˆj−1 1 p−1 p the corresponding continuous belief bi = 2 M + M . traditional MDP cannot be directly applied here. To solve j this problem, we propose a Multi-dimensional MDP (M-MDP) Then, with Bi = Bq, we can have the value interval of j q−1 q model, and a modified value iteration method to derive the best bi =[M , M ]. Thus, the belief state transition probability response (strategy) for each SU. can be computed by q M A. System State j j−1 j j−1 j P Bi = Bq|Bi = Bp = P (bi |ˆbi )dbi . (15) q−1 To construct the MDP model, we first define the system M state and verify the Markov property of the state transition. j j−1 N B ∈ According to (12) and (13), we have bi = φ ˆbi = Let the quantized belief =(B1,B2, ..., BN ) [1,M] be p−1 p j the belief state. Thus, we can define the system state S as the 1 + ,tj ,s . Therefore, the belief state transition B G ∈ 2 M M i belief state with the grouping state =(g1,g2, ..., gN ) probability can be re-written by (16) below, where the second N B G [0,L] , i.e., S =( , ) , where the finite state space is equality follows the assumption that the arrival interval of N N X = [1,M] × [0,L] . When the jth SU arrives, the two SUs tj obeys exponential distribution with parameter