arXiv:2103.05047v1 [cs.NI] 8 Mar 2021 ie[] h fhgnrto 5)sadr nrdcdthre of introduced period standard short (5G) a generation within fifth rapidly The vary [2]. can time traffic and distributio The users exist. of challenges many about, bring mmWave o direction the in is propagation of beamforming user. pattern the where the reshape communication to directional used losse use such combat to to approach is One that range. losses coverage propagation its hinder high from suffers mmWave However, nw pcrmsact rbe ftesub- the of problem scarcity spectrum known maeUigCutrn n epReinforcement Deep and Clustering Using mmWave mWv)tcnlg spoiiglreadunderutilized and Wave large Millimeter between promising spectrum managed. are is resources technology paradigm the (mmWave) a way applications, adopt the data-hungry to in shift of need networks use wireless growing next-generation a from stemming em fltny eiblt,adrt fULCuesa wel as users users. URLLC eMBB of rate of and algorith rate uses reliability, baseline latency, the that of outperforms terms baseline scheme proposed a the res simulation that to clus Our respectively. for compared allocation, fairness resource and performan is proportional the priority-based The scheme and allocation. and K-means proposed block resource the users (DRL) for of Learning Ter of used Short Reinforcement is Long clustering addition, scheme Deep In online (LSTM)-based Noi beams. of Memory for number with the Applications of used selection Specifica of is scheme. Clustering (DBSCAN) allocation Spatial resource (QoS) in Quality-of-Service Density-Based and users a clustering propose (eMBB) conside aware and BroadBand we networks paper, Mobile mmWave Communicat this 5G enhanced Low-Latency In and Ultra-Reliable traffic. (URLLC) of their coexistence and f the users be arises that and of dynamicity allocation the loa mobility address resource to traffic radio needed from dynamic efficient is move management causing such, another), users As to beam. (i.e., use up-to-date per beam clusters of one maintain of the of mobility patterns to coverage motivates Furthermore, varying scheme users to clusters. clustering leads of those online towards mobility beams an The for beams. need of num the direction identify to and needed is clustering networks, (mmWave) ept h efrac an htbafrigalongside beamforming that gains performance the Despite ihteupeeetdgot fmbl aatraffic data mobile of growth unprecedented the With Abstract ai eoreadBa aaeeti 5G in Management Beam and Resource Radio T pial oe sr nmillimeter-Wave in users cover optimally —To ehtElsayed, Medhat 30 .I I. and NTRODUCTION 300 H,wihadesstewell- the addresses which GHz, tdn ebr IEEE Member, Student colo lcrclEgneigadCmue Science Computer and Engineering Electrical of School mi:{es04 melike.erolkantarci}@uottawa.ca {melsa034, Email: 6 H ad[1]. band GHz nvriyo Ottawa of University Learning lsshow ults taa Canada, Ottawa, tering in m rom n eieErol-Kantarci, Melike and , as l ber ion the am lly, ce m se rs d n e r s f ial,scinV ocue h paper. the concludes the resu VI performance presents section and Finally, V algorithm Section baseline discussed. setup, simulation is rate algorithm network th proposed and improving the IV, model section system In the formulation. introduces problem III Section work. related well as users. users eMBB URLLC of of rate rate as and resource outperfor reliability, DQLD latency, for in that DRL KPPF reveal of results Simulation Priority-based instead allocation. and (KPPF) employs DBSCAN Fairness that of Proportional algorithm instead baseline clustering a compare K-means to we algorithm Q- Furthermore, proposed Deep as (DQLD). the algorithm DBSCAN our with call We learning allocation. RB for to deep addition (LSTM)-based in Memory beams, algo- Short-Term managing Long DBSCAN-based and clustering a user mmWave propose for for rithm we technique particular, clus- allocation QoS-aware In a block networks. propose resource we and as purpose, tering this changes For beam within beam. users per among same resources i load allocate beam. allocation efficiently that (RB) to single Block needed fact a Resource clusters, by the among served move to users is be due clustering can addition, that online and In users an URLLC cluster mobile, to serving are sought users for Since beamforming users. eMBB employs that work is allocation form the resource across variations of radio load beams. mobility consider intelligent actively beam and to an intelligent needed Qos an Second, captures for users. that becomes calls algorithm allocation first, management resource This, radio challenging. more and dynamicit network management such With complexit users. beam of of mobility layer to added due (QoS arises an Quality-of-Service applications Furthermore, serve tight [4]. to and requirements expected heterogeneity are net- more 6G wireless with and addition, 5G In machine- beyond massive [3]. works and (mMTC) (eMBB), communication Band type Broad Mobile hanced (URLLC), Low-Latency Ultra-Reliable categories: service hsppri tutrda olw.ScinI rsnsthe presents II Section follows. as structured is paper This net- mmWave heterogeneous a consider we paper, this In eirMme,IEEE Member, Senior en- the lts. for ms ed y, y a e s ) II. RELATED WORK , and specifically deep reinforcement learning, is gaining more popularity in the applications of wireless networks [5]. An unsupervised machine learning method was proposed in [6] for automatic identification of the optimal number of clusters in mmWwave Non-Orthogonal Multiple Access (NOMA). The proposed clustering is an agglomerative that infers the number of clusters for maximum sum rate in the network. In [7], authors consider a mmWave network that comprises femto access Fig. 1: System model of mmWave network. points and femto users and propose a clustering algorithm for improving network rate. Furthermore, the authors propose a joint user-cell association and resource allocation to further users covered by bth beam and the communication between improve the performance. The solutions are formulated as beams and their associated users follows 5G-NR release 15 optimization problems and solved with some simplifications. [11]. Furthermore, beams use Orthogonal Frequency Division The work in [8] addresses the joint design of beamforming, Multiple Access (OFDMA) to allocate orthogonal resources to power, and interference coordination in a mmWave network. their users, hence intra-beam interference can be omitted. The th The authors formulate a non-convex optimization problem to bandwidth, ωb, of b beam is subdivided into a number of maximize Signal-to-Interference plus Noise Ratio (SINR) and Resource Blocks (RBs), where a RB consists of 12 subcarriers. solve it using deep reinforcement learning. In [9], the authors Furthermore, contiguous RBs are grouped to form a RBG. Let consider a mmWave NOMA system and aim to maximize the k K denote a RBG and the bandwidth of a RBG is denoted ∈ sum rate of the network using clustering and Power Domain by ωk,b. The duration of RB (or RBG) can span multiple NOMA (PD-NOMA). In particular, clustering relies on the OFDM symbols in time, which is denoted as Transmission fact that adjacent users experience similar channel. Further- Time Interval (TTI). Hence, the minimum resource allocation more, authors derive closed form expressions for optimal is considered to be one RBG, where we select 2 OFDM NOMA power allocation. symbols as the length of a TTI to encourage low latency for Unlike previous works, we address the dynamicity in a URLLC users [12], [13]. mmWave network using joint online clustering and resource The initial positions of users follow a Poisson Cluster block allocation. The proposed algorithm aims to address Process (PCP), in which heads of clusters are uniformly dis- network conditions as well as heterogeneous traffic captured tributed and users within each cluster are uniformly distributed through user-specific QoS requirements. As such, we propose within the radius of the cluster. In addition, mobility of users an LSTM-based deep reinforcement learning algorithm for follow random waypoint mobility model. The traffic of users resource allocation. follows Poisson distribution with λ inter-arrival time and a In [10], we proposed a Q-learning algorithm for joint fixed packet size of 32 bytes. As such, users tend to leave their user-cell association and inter-beam power allocation. The clusters and join new ones as time proceeds. The mmWave proposed algorithm focused on improving the network sum channel can be modeled using a single Line-of-Sight (LoS) rate when applying NOMA in mmWave networks. This paper path model, where the gain of the LoS path is larger than differs from our previous work in two aspects. First, we the gain of non-Line-of-Sight (nLoS) paths [9]. As such, the employ orthogonal multiple access instead of NOMA and M×1 th th channel vector, hk,u,b C , between b beam and u propose deep reinforcement learning algorithm for radio re- user on kth RBG is represented∈ as follows: source allocation. In addition, here, we employ DBSCAN for user clustering and to the best of our knowledge, this is the αk,u,b h = v(θ ) , (1) first time DBSCAN and DRL is used jointly for resource k,u,b u,b n √M(1 + du,b) management. 2 where αk,u,b CN(0, σ ) is the complex gain, M is the III. SYSTEM MODEL ∈ number of paths, du,b is the Euclidean distance, and n is the A. Network Model pathloss exponent. Note that we removed the gNB’s index v Consider a mmWave network with g G of 5G-NodeBs to keep the formulation readable. In addition, (θu,b) is the (gNBs), where each gNB covers U U single-antenna∈ users. steering vector, which can be represented as follows: Users are partitioned into different clusters,∈ where each cluster L L −j2π sin(θu,b) −j2π(N−1) sin(θu,b) T is served by a single beam denoted by b B as shown in Fig. v(θu,b) = [1,e λ , ..., e λ ] , 1. We consider two types of users with different∈ QoS: URLLC (2) and eMBB users. In particular, URLLC users require a low where L is the gNB’s antenna spacing, N is the number of latency and high reliability communication, whereas eMBB antenna’s elements, λ is the wavelength, θu,b is the Angle of users require high rate communication. Let Ub be the set of Departure (AoD). B. QoS Requirements allocation. Online clustering is used to cluster users that are The proposed scheme aims at addressing the QoS differ- adjacent to each other and can be covered by a single beam. ences among the users in the network. In particular, URLLC In addition, the online clustering algorithm aims to find the users need to maintain high reliability and low latency com- optimal number of beams for coverage. On the other hand, munication links, whereas eMBB users need to achieve high for resource block allocation we use deep Q-learning. The rate. joint beam and resource management scheme is Deep Q- 1) Rate of eMBB users: The sum rate of eMBB users per learning with DBSCAN (DQLD). DBSCAN is used for online gNB is formulated as follows: clustering and deep Q-learning is used for resource block allocation. An online algorithm is needed to maintain efficient C = δk,u,bωk,b log2(1+Γk,u,b), (3) coverage of mobile users. In this work, we adopt DBSCAN b∈ u∈ e k∈ XB XUb XKb for user clustering and selection of number of beams due to its where δk,u,b is a RBG allocation indicator, ωk,b is the size of advantages over other clustering techniques. [16]. DBSCAN e th does not require a predefined number of clusters. Instead, RBG in Hz, Ub is the set of eMBB users that belong to b th beam, and Γk,u,b is the SINR of (k,u,b) link, which can the algorithm identifies users that can belong to a cluster be expressed as from sparse users and returns the number and structure of clusters. In addition, DBSCAN has low complexity and easy hH w 2 pk,b k,u,b k,b implementation. Γk,u,b = | H | , (4) 2 ′ h w ′ 2 σ + pk,b k,u′,b′ k,b Note that, frequent online clustering can lead to a challeng- b′=6 k | | ing resource allocation problem. In particular, performing the P where pk,b and wk,b denote the power and beamforming clustering very often leads to frequent changes in the structure th th vector of k RBG of b beam. pk,b′ and wk,b′ denote the and number of beams. As such, resource allocation has to deal power and beamforming vector of kth RBG of b′th interfering with a very dynamic environment. Furthermore, clustering beam. σ2 represents receiver’s noise variance. might be needed only whenever the beams are not efficient 2) Latency and reliability of URLLC users: Latency of enough to cover users (i.e., users have changed their positions URLLC users is formulated as follows: and tend to belong to new clusters). Therefore, determining tx q harq the frequency of clustering is important. We choose to perform Du,b = Du,b + Du,b + Du,b , (5) clustering only when the average SINR of a beam drops under tx q a predefined threshold. where Du,b is the transmission latency, Du,b is the queuing latency (i.e., latency of packet pending in the transmission Clustering returns a set of beams to cover network users. harq Within each beam, we perform resource block allocation using buffer), and Du,b is the Hybrid Automatic Repeat Request (HARQ) re-transmission latency of uth user on bth beam. an LSTM-based deep Q-learning technique, namely DQL. The harq tuples of DQL is defined as follows: In line with [14], we assume Du,b = 4T T I. In particular, queuing and re-transmission latencies constitute the dominant 1) Agents: DQL is a multi-agent distributed algorithm that factors in eq. (5) [15]. Furthermore, the queuing latency is a is performed independently by each gNB (i.e., each gNB is a direct outcome of the decision of the scheduler (i.e., queuing standalone agent). Each gNB performs DQL to allocate RBGs and scheduling latencies are identical). As such, in order within each of its beams. to achieve low latency for URLLC users, the scheduler has 2) Actions: The actions are defined as the RBGs allocated to immediately allocate resources to URLLC traffic once it to users per beam as arrives. Furthermore, the number of re-transmissions has to ak,b = uk,b , (6) be limited, where in this work, we assume 1 HARQ re- { } where a denotes the action of kth RBG of bth beam, and transmission. k,b u is the user index. Limiting the number of re-transmissions, however, may k,b 3) States: We design the states in a way that captures impact the reliability of URLLC users. To maintain high reli- the level of inter-beam interference. In particular, states are ability, link adaptation is performed, where users periodically defined in terms of the CQI feedback measured at the user. report SINR measurements to the gNB in the form of Channel Therefore, state of kth RBG of bth beam is defined as Quality Indicator (CQI) values. CQI indicates the quality (i.e., SINR) of the link with the associated beam. In turn, the gNB sk,b = qk,b , (7) { } accounts for those measurements in the scheduling policy. th th where qk,b is the CQI of k RBG of b beam. In the following section, we show how the proposed DRL 4) Reward: The reward function is designed to account for addresses both latency and reliability of URLLC users. different users’ classes (i.e., URLLC and eMBB). In particular, IV. DEEP Q-LEARNING WITH DBSCAN(DQLD) URLLC users require tight latency and reliability, whereas eMBB users require high throughput. Therefore, the reward In order to maintain high QoS of URLLC and eMBB function is defined as in the midst of changing network conditions, the proposed (mbb) (llc) algorithm considers an online clustering (for the purpose of sigm(r r ), C(u)=1, r = k,b k,b (8) beam management) and a machine-learning based resource k,b (mbb) (sigm(rk,b ), C(u)=2, where sigm(x) denotes a sigmoid function defined as 1 sigm(x)= . (9) 1+ e−x In eq. (8), C(u) presents the Quality Class Indicator (QCI) of uth user, where C(u)=1 denotes URLLC users, while (mbb) (llc) C(u) =2 denotes eMBB users. rk,b and rk,b are the reward functions of eMBB and URLLC users, respectively, which are defined as follows:

QoS (llc) D Fig. 2: Conceptual diagram of LSTM-based Deep Q-learning. rk,b = , (10) Dk,b(u)

Γ V. PERFORMANCE EVALUATION r(mbb) = k,b , (11) k,b ΓQoS A. Simulation Setup

th We perform simulation using a discrete event simulation where Dk,b(u) is the queuing latency due to allocating k RBG of bth beam to uth user, DQoS is the QoS requirement based on 5G Matlab Toolbox. Table I presents network and simulator, and DQLD algorithm settings. The network is of latency, and ΓQoS is the QoS requirement of SINR. It composed of two gNBs with m inter-gNBs distance. is worth mentioning that gNB has knowledge of QCI of 300 its radio bearers and queuing latency of its users. As such, Initial positions of users follow a PCP with 2 clusters, where each cluster has a radius of m. The performance of when traffic on link (k,b) belongs to a URLLC user, the 20 the algorithms is tested under different traffic loads (i.e., reward constitutes a combination of reliability and queuing {0.5, 1, 1.5, 2 Mbps per gNB). The DQLD algorithm consists latency. Indeed queuing latency dominates the total latency of } downlink transmission [17]. On the other hand, when traffic of 15 states (i.e., corresponding to number of CQIs) and 24 actions (i.e., actions correspond to total number of users per on link (k,b) belongs to a eMBB user, the reward constitutes reliability, which translates to higher transmission throughput gNB). (i.e., improving SINR enables allocation of higher modulation B. Baseline Algorithm and coding scheme and higher transport block size). Finally, For fair comparison, we use a baseline algorithm that the sigmoid function is used to keep the reward in the interval works in a similar fashion to the proposed algorithm and [0, 1]. has been used in the literature before. In the baseline, K- Fig. 2 presents a conceptual diagram of the LSTM-based means is used to perform online clustering [9], whereas DQL approach. It is worth mentioning that gNB has a separate priority-based proportional fairness is used for resource block DQL entity for each beam it forms. For each beam, the allocation as proposed in [13]. In [9], clustering using K- DQL works as follows. The gNB computes the next state means is performed based on channel properties at the user and the reward as in (7) and (8), respectively, from the CQI side, i.e., users that are close in proximity are more likely and SINR feedback received from its users. The experience, to experience similar channels. Furthermore, in [9], resource st,at, rt ,st , is then stored in the experience replay { +1 +1} block allocation is performed using a hard QoS-aware criteria. memory to be used later for training the LSTM neural In particular, RBGs are given to URLLC users with pending network, where st,at, rt+1, and st+1 are the state, action data transmission first, then the remaining RBGs are allocated th th at t time step, reward, and next state at (t + 1) time to eMBB users. Within each user class, RBGs are distributed step. Afterwards, LSTM is used to predict the Q-values of all according to proportional fairness criteria as actions of the next state (i.e., Q(s ,a′). Finally, the Q-values t+1 C is fed to the ǫ-greedy algorithm for next action selection. u∗ = argmax k,u,b , (12) The ǫ-greedy algorithm selects either a random action with Ck,u,b probability (ǫ) or an action that follows the greedy policy where u∗ is the selected user to be allocated kth RBG. with probability (1 ǫ). − To maintain low complexity, the training of LSTM is C. Performance Results performed every T TTIs. In particular, a batch of experience In this subsection, we present the simulation results of the samples is drawn randomly from the experience replay mem- proposed DQLD scheme and compare it to the baseline KPPF ory. The batch is fed to the target LSTM in order to compute algorithm. The performance is assessed in terms of URLLC a sequence of reference responses. These responses constitute and eMBB QoS requirements. Fig. 3 and Fig. 4 present the the labels used to train the main LSTM network. In addition, Empirical Complementary Cumulative Distribution Function the target LSTM is initially loaded with the weights of the (ECCDF) of latency of URLLC users. The figures show the main network. However, the update of the target LSTM’s latency under increasing URLLC traffic load. Both figures weights are done every C TTIs to maintain stability [18]. demonstrate the superiority of DQLD over KPPF despite that TABLE I: Simulation Settings

5G PHY configuration Traffic Bandwidth 20 MHz Distribution Poisson Carrier frequency 30 GHz [19] Packet size 32 Bytes Subcarrier spacing 15 KHz Q-learning Subcarriers per resource block 12 (α) 0.5 TTI size 2 OFDM symbols (0.1429 msec) Discount factor (γ) 0.9 Max transmission power 28 dBm Exploration probability (ǫ) 0.1 URLLC target BLER 1% DQoS 1 msec eMBB target BLER 10% ΓQoS 15 dB [20] HARQ LSTM Type Asynchronous HARQ Size of input layer 1 Round trip delay 4 TTIs Number of hidden units 20 Number of processes 6 Size of output layer 24 Max. number of re-transmission 1 Size of mini-batch 20 Network model Size of replay memory 60 Initial positions Poisson Cluster Process (PCP) Training Interval (T ) 60 Mobility Random waypoint Copy Interval (C) 120 number of URLLC per cluster 2 DBSCAN number of eMBB per cluster 2 minP ts 5 Number of clusters 3 eps 30 Radius of cluster 20 m Simulation parameters Number of gNBs 2 Simulation time 1.5 second Radius of cell 150 m Number of runs 10 Inter-site distance 300 m [19] Confidence interval 95%

100 100 DQDL, Load: 0.5 Mbps DQDL, Load: 1.0 Mbps KPPF, Load: 0.5 Mbps KPPF, Load: 1.0 Mbps 10-1 10-1

10-2

10-2 ECCDF ECCDF 10-3

10-3 10-4 DQDL, Load: 1.5 Mbps DQDL, Load: 2.0 Mbps KPPF, Load: 1.5 Mbps KPPF, Load: 2.0 Mbps 10-4 10-5 0 2 4 6 8 10 12 0 100 200 300 400 500 Latency of URLLC users [ms] Latency of URLLC users [ms] Fig. 3: Latency of URLLC users versus total URLLC offered Fig. 4: Latency of URLLC users versus total URLLC offered load ([0.5, 1] Mbps). load ([1.5, 2] Mbps).

KPPF applies hard QoS rule for scheduling URLLC users first. increasing traffic loads of both URLLC and eMBB users In particular, Fig. 3 demonstrates about 8 ms improvement at simultaneously. For example, 1 Mbps in Fig. 6 refers to the 10−4 percentile and at 1 Mbps offered load. Furthermore, URLLC and eMBB loads (i.e., total load per gNB is 2 Mbps). as offered load increases, KPPF fails to maintain a reasonable As such, increasing the offered URLLC and eMBB loads performance for URLLC users. stresses both algorithms. Fig. 7 presents the achieved rate of In Fig. 4, the latency performance of KPPF degrades eMBB users under different traffic load. Again, the same trend significantly, whereas DQLD was able to achieve much lower appears for KPPF, where 1 Mbps constitutes a break point in latency with about 350 ms difference with respect to KPPF at KPPF’s performance, whereas DQLD demonstrates an ability 2 Mbps. The significant performance degradation of KPPF is to balance resources among users and satisfy the conflicting attributed to a high Packet Loss Rate (PLR) as shown in Fig. 5. QoS requirements. The figure presents the PLR of URLLC users under different VI. CONCLUSION traffic loads. As seen in Fig. 5, DQLD demonstrates a 50% improvement in PLR compared to KPPF. Furthermore, Fig. 6 In this paper, we address the problem of QoS-aware presents the achieved rate of URLLC users under different clustering (for beamforming) and resource allocation in 5G URLLC traffic load. Again, DQLD outperforms KPPF. In millimeter wave networks. We proposed an online clustering fact, increasing the traffic load impacts KPPF significantly, algorithm for identifying the number and structure of beams to where 1 Mbps constitutes a break point for the algorithm. cover network users, in addition to an LSTM-based deep rein- It is worth mentioning that the simulation is performed by forcement learning to perform resource allocation within each 100 2 DQDL DQDL 90 KPPF 1.8 KPPF

80 1.6

70 1.4

60 1.2

50 1

40 0.8

30 0.6

20 0.4 Average PLR of URLLC users [%] 10 Throughput of eMBB users [Mbps] 0.2

0 0 0.5 1 1.5 2 0.5 1 1.5 2 Total offered load of URLLC users [Mbps] Total offered load of eMBB users [Mbps] Fig. 5: Packet loss rate of URLLC users versus total URLLC Fig. 7: Sum rate of eMBB users versus total eMBB offered offered load. load.

2 DQDL [6] D. Marasinghe, N. Jayaweera, N. Rajatheva, and M. Latva-Aho, “Hier- 1.8 KPPF archical User Clustering for mmWave-NOMA Systems,” in 6G Wireless 1.6 Summit (6G SUMMIT), pp. 1–5, February 2020. 1.4 [7] B. Soleimani and M. Sabbaghian, “Cluster-Based Resource Allocation and User Association in mmWave Femtocell Networks,” IEEE Trans- 1.2 actions on Communications, vol. 68, pp. 1746–1759, November 2020. 1 [8] F. B. Mismar, B. L. Evans, and A. Alkhateeb, “Deep Reinforcement 0.8 Learning for 5G Networks: Joint Beamforming, Power Control, and 0.6 Interference Coordination,” IEEE Transactions on Communications, vol. 68, pp. 1581–1592, June 2020. 0.4 [9] J. Cui, Z. Ding, P. Fan, and N. Al-Dhahir, “Unsupervised Machine Throughput of URLLC users [Mbps] 0.2 Learning-Based User Clustering in Millimeter-Wave-NOMA Systems,” 0 IEEE Transactions on Wireless Communications, vol. 17, pp. 7425– 0.5 1 1.5 2 Total offered load of URLLC users [Mbps] 7440, November 2018. [10] M. Elsayed, K. Shimotakahara, and M. Erol-Kantarci, “Machine Fig. 6: Sum rate of URLLC users versus total URLLC offered Learning-based Inter-Beam Inter-Cell Interference Mitigation in load. mmWave,” in IEEE International Conference on Communications (ICC), June 2020. [11] 3GPP, “NR; Physical channels and modulation,” Technical specification (TS) 38.211, 3rd Generation Partnership Project (3GPP), July 2018. beam. The proposed algorithm is compared to a baseline that v15.2.0. uses K-means for clustering and priority-based proportional [12] M. Elsayed and M. Erol-Kantarci, “Deep Reinforcement Learning for Reducing Latency in Mission Critical Services,” in IEEE Global fairness for resource allocation. Simulation results reveal that Communications Conference (GLOBECOM), pp. 1–6, December 2018. the proposed algorithm outperforms the baseline in terms of [13] G. Pocovi, K. I. Pedersen, and P. Mogensen, “Joint Link Adaptation latency, reliability, and rate of URLLC users as well as rate and Scheduling for 5G Ultra-Reliable Low-Latency Communications,” IEEE Access, vol. 6, pp. 28912–28922, May 2018. of eMBB users. [14] G. Pocovi, B. Soret, K. I. Pedersen, and P. Mogensen, “MAC Layer Enhancements for Ultra-Reliable Low-Latency Communications in Cel- ACKNOWLEDGMENT lular Networks,” in IEEE International Conference on Communications Workshops (ICC Workshops), pp. 1005–1010, July 2017. This research is supported by U.S. National Science Foun- [15] M. Elsayed and M. Erol-Kantarci, “Learning-Based Resource Allocation dation (NSF) under Grant Number CNS-1647135 and Nat- for Data-Intensive and Immersive Tactile Applications,” in IEEE 5G ural Sciences and Engineering Research Council of Canada World Forum (5GWF), pp. 278–283, November 2018. [16] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based (NSERC) under Canada Research Chairs Program. Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Second International Conference on Knowledge Discovery and REFERENCES , p. 226–231, August 1996. [1] J. G. Andrews et al., “What Will 5G Be?,” IEEE Journal on Selected [17] M. Elsayed and M. Erol-Kantarci, “AI-Enabled Radio Resource Allo- Areas in Communications, vol. 32, pp. 1065–1082, June 2014. cation in 5G for URLLC and eMBB Users,” in IEEE 5G World Forum [2] Y. Liu, X. Wang, G. Boudreau, A. B. Sediq, and H. Abou-zeid, “Deep (5GWF), pp. 590–595, November 2019. Learning Based Hotspot Prediction and Beam Management for Adaptive [18] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- Virtual Small Cell in 5G Networks,” IEEE Transactions on Emerging stra, and M. A. Riedmiller, “Playing Atari with Deep Reinforcement Topics in Computational Intelligence, vol. 4, pp. 83–94, January 2020. Learning,” CoRR, vol. abs/1312.5602, pp. 1–9, January 2013. [3] ITU, “5G overview,” in Setting the Scene for 5G: Opportunities and [19] W. Zhang, Y. Wei, S. Wu, W. Meng, and W. Xiang, “Joint Beam Challenges, Geneva: ITU, 2018. and Resource Allocation in 5G mmWave Small Cell Systems,” IEEE [4] W. Saad, M. Bennis, and M. Chen, “A Vision of 6G Wireless Systems: Transactions on Vehicular Technology, pp. 1–1, July 2019. Applications, Trends, Technologies, and Open Research Problems,” [20] N. I. B. Hamid, N. Salele, M. T. Harouna, and R. Muhammad, “Analysis IEEE Network, pp. 1–9, October 2019. of LTE Radio Parameters in Different Environments and Transmission [5] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y. Liang, Modes,” International Conference on Electrical Information and Com- and D. I. Kim, “Applications of Deep Reinforcement Learning in munication Technology (EICT), pp. 1–6, March 2014. Communications and Networking: A Survey,” IEEE Communications Surveys Tutorials, vol. 21, pp. 3133–3174, May 2019.