Machine Learning for Wireless Link Adaptation
Supervised and Reinforcement Learning Theory and Algorithms
VIDIT SAXENA
Doctoral Thesis in Electrical Engineering Stockholm, Sweden, 2021 KTH Royal Institute of Technology School of Electrical Engineering and Computer Science Division of Information Science and Engineering TRITA-EECS-AVL-2021:35 SE-10044 Stockholm ISBN 978-91-7873-886-1 Sweden
Akademisk avhandling som med tillst˚and av Kungl Tekniska h¨ogskolan framl¨agges till o↵entlig granskning f¨or avl¨aggande av Teknologie doktorexamen i elektroteknik torsdagen den 20 maj 2021 klockan 13.00 i Sal F3, Lindstedtsv¨agen 26, Kungliga Tekniska H¨ogskolan, Stockholm. Academic thesis which, with permission of the KTH Royal Institute of Technology, is submitted for public scrutiny for the completion of the Ph.D. in Electrical Engi- neering on Thursday May 20, 2021 at 13.00 in the lecture hall F3, Lindstedtsv¨agen 26, KTH Royal Institute of Technology, Stockholm.
c Vidit Saxena, April 28, 2021 Tryck: Universitetsservice US AB i
Abstract
Wireless data communication is a complex phenomenon. Wireless links encounter random, time-varying, channel e↵ects that are challenging to pre- dict and compensate. Hence, to optimally utilize the channel, wireless links adapt the data transmission parameters in real time. This process, known as wireless link adaptation, can lead to large gains in link performance. Link adaptation is hence an integral part of state-of-the-art wireless deployments. Existing link adaptation schemes use simple heuristics that match the data transmission rate to the estimated channel. These schemes have proven to be useful for the ubiquitous wireless services of voice telephony and mo- bile broadband. However, as wireless networks increase in complexity and also evolve to support new service types, these link adaptation schemes are rapidly becoming inadequate. The reason for this change is threefold: first, in several operating scenarios, simple heuristics-based link adaptation does not fully exploit the available channel. Second, the heuristics are typically tuned empirically for good performance, which incurs additional expense and can be error-prone. Finally, traditional link adaptation does not naturally extend to applications beyond the traditional wireless services, for example to industrial control or vehicular communications. In this thesis, we address wireless link adaptation through machine learn- ing. Our proposed solutions e ciently navigate the link parameter space by learning from the available information. These solutions thus improve the link performance compared to the state-of-the-art, for example by doubling the link throughput. Further, we advance link adaptation support for new wireless services by optimizing the link for complex performance objectives. Finally, we also introduce mechanisms that autonomously tune the link adap- tation parameters with respect to the operating environment. Our schemes hence mitigate the dependence on empirical configurations adopted in current wireless networks. This thesis is composed of six technical papers. Based on these papers, there are three key contributions of this thesis: a neural link adaptation model (Paper I, Paper II,andPaper III), link adaptation under packet error rate constraints (Paper IV and Paper V), and e cient model-based link adaptation (Paper VI). In this thesis, we emphasise the theoretical underpinnings of our pro- posed machine learning schemes for link adaptation. We approach this goal in three ways: First, we make theoretically reasoned choices for machine learn- ing models and learning algorithms for link adaptation. Second, we extend these models for the specific problem formulations encountered in link adap- tation. For this, we develop rigorous problem formulations that are analyzed using classical techniques. Third, we develop theoretical results for the real- time behaviour of the proposed schemes. These bounds extend the machine learning state-of-the-art in terms of performance bounds for stochastic online optimization. The contributions of this thesis hence go beyond the realm of wireless optimization, and extend to new developments applicable to broader machine learning problems. ii
Keywords: Wireless Communications, Reinforcement Learning, Multi-Armed Bandits, Thompson Sampling, Convex Optimization, Deep Learning. iii
Sammanfattning
Tr˚adl¨os datakommunikation ¨ar ett komplext fenomen. Tr˚adl¨osa l¨ankar st¨oter p˚aslumpm¨assiga och tidsvarierande kanale↵ekter som ¨ar utmanande att f¨oruts¨aga och kompensera f¨or. F¨or att optimalt utnyttja den tr˚adl¨osa kanalen anpassar d¨arf¨or kommunikationssystem data¨overf¨oringsparametrarna i realtid. Denna process, ¨aven kallad tr˚adl¨os l¨ankanpassning, kan leda till stora vinster i l¨ankprestanda. L¨ank-anpassning ¨ar d¨arf¨or en integrerad del av alla moderna kommunikationssystem. Befintliga metoder f¨or l¨ankanpassning anv¨ander enkla heuristiker som an- passar data¨overf¨oringshastigheten till den skattade tr˚adl¨osa kanalen. Dessa system har visat sig vara anv¨andbara f¨or de brett anv¨anda tr˚adl¨osa tj¨ansterna r¨osttelefoni och mobilt bredband. Eftersom tr˚adl¨osa n¨atverk ¨okar i komplexi- tet och ocks˚autvecklas f¨or att st¨odja nya tj¨anstetyper, blir dock dessa meto- der f¨or l¨ankanpassning snabbt otillr¨ackliga. Anledningen till detta ¨ar trefaldig: F¨or det f¨orsta s˚autnyttjar heuristikbaserad l¨ankanpassning i flera nya tj¨anster utnyttjar helt enkelt inte den tillg¨angliga kanalen till fullo. F¨or det andra s˚a ¨ar heuristiken vanligtvis anpassad empiriskt f¨or bra prestanda, vilket kan va- ra felben¨aget i nya scenarion och vilket medf¨or extra kostnader. Slutligen s˚a generaliserar traditionell l¨ankanpassning inte naturligt till till¨ampningar som g˚ar ut¨over de traditionella tr˚adl¨osa tj¨ansterna, till exempel till industriella reglersystem eller fordonskommunikation. Idennaavhandlingbehandlarvil¨ankanpassning genom maskininl¨arning. V˚ara f¨oreslagna system utforskar e↵ektivt l¨ankparameterutrymmet genom att l¨ara av tillg¨anglig information. De f¨oreslagna metoderna f¨orb¨attrar s˚aledes l¨ankprestandan j¨amf¨ort med den senaste tekniken, till exempel genom att f¨ordubbla l¨ankgenomstr¨omningen. Vidare utvecklar vi ocks˚al¨ankadaptationsst¨od f¨or nya tr˚adl¨osa tj¨anster genom att optimera l¨anken f¨or mer komplexa prestan- dam˚al. Slutligen s˚aintroducerar vi ocks˚amekanismer som autonomt justerar l¨ankanpassningsparametrarna baserat p˚adriftsmilj¨on. V˚ara system mildrar d¨armed beroendet p˚aempiriska konfigurationer som anv¨ands i nuvarande tr˚adl¨osa n¨atverk. Denna avhandling best˚ar av sex tekniska artiklar. Baserat p˚adessa artik- lar finns det tre viktiga bidrag fr˚an denna avhandling: En modell f¨or anpass- ning av neurala l¨ankar (Paper I, Paper II och Paper III), l¨ankanpassning under begr¨ansningar i paketfelfrekvensen (Paper IV och Paper V), och e↵ektiv modellbaserad l¨ankanpassning (Paper VI). I denna avhandling betonar vi den teoretiska grunden f¨or v˚ara f¨oreslagna maskininl¨arningsmetoder f¨or l¨ankanpassning. Vi n¨armar oss detta m˚al p˚atre s¨att: F¨or det f¨orsta g¨or vi teoretiskt motiverade val f¨or maskininl¨arningsmodeller och inl¨arningsalgoritmer f¨or l¨ankanpassning. F¨or det andra ut¨okar vi dessa modeller f¨or de specifika problemformuleringar som p˚atr¨a↵as vid l¨ankanpassning. F¨or detta utvecklar vi noggranna problemformuleringar som analyseras med klassiska tekniker. F¨or det tredje utvecklar vi teoretiska resultat f¨or de f¨oreslagna systemens realtidsbeteende. Dessa gr¨anser ut¨okar f¨altet maskininl¨arningen n¨ar det g¨aller prestationsgr¨anser f¨or stokastisk online-optimering. Bidragen fr˚an denna avhandling g˚ar allts˚aut¨over omr˚adet f¨or tr˚adl¨os kommunikation och str¨acker sig till nya till¨ampningsomr˚aden. iv
सारांश
वायरलसडे टाे सचारं एक ज टल ू बया ह।ै वायरलसे क ड़याँ ( ल स)ं अ यविःथत और बम-र हत चनलै ूभाव का सामना करतीं ह , िजनक तप त ू कर पाना चनौतीपु ण ू ह।ै अतः, चनलै का सव म उपयोग करने के लए, वायरलसे ल सं वाःत वक समय म डटाे सचारणं मापदड ं (परामीटस )ै को अनकु ू लत करते ह । इस ू बया को वायरलसे लकं अनकु ूलन के नाम से जाना जाता ह,ै जो अ याध नकु वायरलसे प र नयोजन का एक अ भ अगहं ।ै मौजदाू लकं अनकु ूलन योजनाएं अनभवु पर आधा रत, सरल, अनमान ु का उपयोग करती ह । आमतौर स,ये े योजनांए डटाे सचारणं दर का अनमा नतु वायरलसचे नलै से मले कराती ह । पव कालू म , ये योजनाएं दरभाषू और मोबाइल ॄॉडब ड क सव यापी वायरलससे वाओे ं के लए उपयोगी सा बत हईु ह । क त,जु सै -जे सै े वायरलसने टवक े ज टल होते जा रहे ह , और नए ूकार क सचारणं - यवःथाएं वक सत हो रह ह , मौजदाू लकं अनकु ूलन योजनाएं भी तजीे से अपया होती चल जा रह ह । इस प रवत न के यह तीन म यु कारण ह : पहला, कई प र ँय म , सरल लकं अनकु ूलन मौजदाू चनलै का पर ू तरह से उपयोग नह ं कर पाता। दसरा,ू सचारणं मापदड ं को सामा यतः आनभा वकु प से चनाु जाता ह,ै जो अ त र सचरणं को बढ़ाता है और इसम ऽ टय ु क स भावना अ धक होती ह।अै ततः,पारं ंप रक लकं अनकु ूलन नयी सवा-ूयोग े क ओर ःवाभा वक प से वःतार नह ं करता - उदाहरण के लए, औ ो गक नयऽणं अथवा वाहन-आधा रत सचार।ं इस शोध ूबधं (थी सस) म , हम मशीन ल न ग के मा यम से वायरलसे लकं अनकु ूलन का अनसु धानं करते ह । हमारे ूःता वत समाधान सामा य सचारणं जानकार से सीखकर, लकसं चरणं मापदड ं का ःवतः और कुशलतापव कू स ालन करते ह । अ याध नकु अनकु ूलन व धय क तलनाु म , हमारे समाधान लकं नंपादन (परफॉरम स) म सधारु करते ह , उदाहरण के लए लकं ूवाह मता (ापू ट)ु को दोगनाु करके । इसके अ त र ,हमारे समाधान ज टल नंपादन उ ेँय के लए लकं को अनकु ू लत करके नई वायरलससे वाओे ं को लाभ पहचातंु े ह । अतं म , हम ऐसी तकनीक भी ूःततु करते ह जो वायरलसे वातावरण के आधार पर, लकं अनकु ूलन मापदड ं को ःवतः सचा लतं करती ह।ै इस ूकार, हमार ूःता वत योजनाएं आज के वायरलसने टवक े क अनभवज यु नभ रता कम करती ह । इस थी सस म छह तकनीक पऽ समा हत ह । इन पऽ के आधार पर,यह थी सस तीन ूमखु ेऽ म योगदान दतीे ह:ै एक यरलू लकं अनकु ूलन मॉडल (पपरे I, पपरे II और पपरे III), पकै े टऽ टु दर क कमी के तहत लकं अनकु ूलन (पपरे IV और पपरे V ), और मॉडल आध रत कुशल लकं अनकु ूलन (पपरे VI)। इस थी सस म , हम लकं अनकु ूलन के लए अपनी ूःता वत मशीन ल न ग योजनाओं क स ां तकै मजबतीू पर बल दते े ह । इस लआय तक पहँचनु े के लए हम न न तीन सऽ ू को अपनाते ह : सबसे पहल,े हम लकं अनकु ूलन के ि कोण से उिचत, मशीन ल न ग के स ां तकै मॉड स और अ गो रथ स का ूयोग करते ह । दसरा,ू हम लकं अनकु ूलन म आई वशषे समःयाओं के हते ु मशीन ल न ग तकनीक का वःतार करते ह । तीसरा, हम ूःता वत योजनाओं के वाःत वक-समय यवहार के लए स ां तकै प रणाम वक सत करते ह । ये प रणाम नंपादन सीमा के सदभ ं म अ याध नकु मशीन ल न ग को भी वक सत करती ह । अतः इस शोध ूबधकं े योगदान वायरलसे अनकु ूलन से बढ़कर, मशीन ल न ग म या समःयाओं पर नए वकास क ओर बढ़ावा दते े ह ।
Acknowledgement
The journey of academic research is full of unforeseen paths and uncertain out- comes. However, regardless of its conclusion, research is a rewarding endeavor in and of itself. My doctoral project has been structured as an industry-academia collaboration between Ericsson AB and the KTH Royal Institute of Technology, Sweden. During the course of my doctoral work, I have been extremely fortunate to have received the support and guidance of uncountably many people at both these organizations and beyond. Their presence has made these past few years the most fulfilling and productive time of my life. My first token of gratitude is for my principal supervisor, Prof. Joakim Jald´en, for his keen and insightful guidance throughout the period of my doctoral work. I am deeply inspired by Joakim’s calm and focused attitude toward research, which has shaped my own approach towards addressing new challenges. I am also grate- ful for the unwavering support of my co-supervisors, Prof. Mats Bengtsson, and Dr. Hugo Tullberg. Mats drew on his immense bank of knowledge to help guide my theoretical ideas towards a broader application context. Hugo, who is with Ericsson Research, inspired me to think beyond incremental gains and instead ex- plore the vast unknown, which has influenced some of the more significant impact of my work. During my doctoral work, I have also had the privilege of working at the University of California, Berkeley (UCB), USA, as a visiting scholar. I will forever be grateful to Prof. Ion Stoica for inviting me to his lab, and to Dr. Joseph E. Gonzalez for his support and guidance during my visit. The scale and ambition of the projects at UCB is truly awe-inspiring, and has motivated me to identify and address challenging problems in my research domain. At Ericsson, I am deeply grateful for the support that I have received from my manager, Markus Ringstr¨om. Markus was responsible both for bringing the doctoral position to my knowledge, and for suggesting that I reach out to Joakim. Over these years, Markus has steadfastly ensured that I have access to the best of resources and opportunities at all times. I am also thankful to Dr. Anders Casp´arfor facilitating my collaboration with KTH in a smooth manner. I am grateful to the Wallenberg AI, Autonomous Systems and Software Pro- gram (WASP) funded by the Knut and Alice Wallenberg Foundation, for their financial support. I am also thankful to WASP for organizing myriad courses,
vii viii ACKNOWLEDGEMENT
study trips, and summer schools, that have contributed greatly to my develop- ment as a researcher. I express my gratitude to the administrative sta↵that have, often behind the scenes, ironed out the operations at Ericsson, KTH, UCB and WASP. I am thankful for the joyous company of my KTH colleagues, who filled this time with bright ideas and cheerful conversation. In particular, I have learnt a lot from the collaborative work with Pol del Aguila Pla, Lissy Pellaco, and Baptiste Cavarec. I am also thankful for the company of H˚akan Carlsson, Xuechun Xu, and the rest of my peers at KTH. I have gained immensely from the discussions with the faculty at KTH, and express my deepest gratitude for their kindness and insightful comments. Apart from KTH, I am thankful for the useful collab- orations with Dr. Henrik Klessig and Simon Lindst˚ahl that have made valuable contributions to my work. My time at Ericsson has been enriched with the presence of knowledgeable colleagues, which has significantly improved the quality of my work. I especially extend my thanks to Dr. David Astely, Dr. Euhanna Ghadimi, and Dr. Rohit Chandra at Ericsson in Stockholm, Sweden, for their involved discussions in the context of my work. Dr. Ali Khayrallah and Per Karlsson were kind enough to host me at their Ericsson Research group in Santa Clara, USA. I will forever be thankful to Nimish Radia for believing in me and introducing me to his academic network at UCB. I am also thankful to many other colleagues at Ericsson Re- search, both in Stockholm, Sweden and Santa Clara, USA, who have shaped my research with their valuable contributions. My academic journey as a doctoral candidate is nestled firmly in the warmth and love of my friends and family. The support from Rohit and Soni, and Vivek, during crucial moments has made the timely conclusion of my work possible. The time spent with Emmanuel and Neha in Lund, and the deep meaningful conversations with Akhila, will always be cherished. I have been extremely lucky to have my parents’ firm support and belief in every endeavour. As is perhaps typical of parents, they have believed in me far more than I can honestly claim to be justified. However, they have also sacrificed immensely to support me as I pursued my aspiration in far-away, foreign, lands. In this, I am forever indebted to my Dada and Bhabhi, for being near to our parents and taking care of the bulk of their needs. Their lovely Joy brings smiles to all our faces every single day and keeps us rooted to what is most important - a cohesive, happy, family. My deepest regards is reserved for my grandparents, who continue to inspire everyday with their strength, wisdom and infinite love. My final words of love and gratitude are dedicated to my wife, Sameeksha, who has added such color and meaning to life as I had never imagined possible. We have made innumerable memories during this time, and look forward to incredible years ahead with our two little treasures, Agastya and Divit. Acronyms
List of commonly used acronyms:
3GPP Third Generation Partnership Project 4G Fourth Generation 5G Fifth Generation ACK Acknowledgement AMC Adaptive Modulation and Coding ANN Artificial Neural Network BICM Bit-interleaved coded modulation CMAB Contextual Multi-Armed Bandit CQI Channel Quality Indicator DRL Deep Reinforcement Learning FEP Frame Error Probability IEEE Institute of Electrical and Electronics Engineers ILLA Inner Loop Link Adaptation LTS Latent Thompson Sampling MAB Multi-armed Bandit MCS Modulation and Coding Scheme NACK Negative Acknowledgement OFDM Orthogonal Frequency Division Multiplexing OLLA Outer Loop Link Adaptation OLM O✏ine Link Model RL Reinforcement Learning SINR Signal to Interference and Noise Ratio TS Thompson Sampling UCB Upper Confidence Bound WiFi Wireless Fidelity (IEEE 802.11)
ix
List of Papers
I Deep learning for frame error probability prediction in BICM-OFDM systems. Vidit Saxena, Joakim Jald´en, Mats Bengtsson, Hugo Tullberg IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)
II A learning approach for optimal codebook selection in spatial modu- lation systems. Vidit Saxena, Baptiste Cavarec, Joakim Jald´en, Mats Bengtsson, Hugo Tull- berg 52nd Asilomar Conference on Signals, Systems, and Computers (2018)
III Contextual multi-armed bandits for link adaptation in cellular net- works Vidit Saxena, Joakim Jald´en, Joseph E. Gonzalez, Mats Bengtsson, Hugo Tullberg, Ion Stoica Proceedings of the 2019 Workshop on Network Meets AI & ML (NetAI’19) (2019)
IV Bayesian link adaptation under a BLER target Vidit Saxena, Joakim Jald´en 21st IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) (2020)
V Thompson sampling for linearly constrained bandits Vidit Saxena, Joseph E. Gonzalez, Joakim Jald´en 23rd International Conference on Artificial Intelligence and Statistics (AIS- TATS) (2020)
VI Reinforcement learning for e cient and tuning-free link adaptation Vidit Saxena, Hugo Tullberg, Joakim Jald´en IEEE Transactions on Wireless Communications (Under Review)
xi xii LIST OF PAPERS
Other Papers
During the timeframe of my doctoral work, I collaborated on a few additional projects, which are not reflected in this thesis. These projects led to peer-reviewed papers that are nevertheless listed below for completeness.
I Optimal UAV base station trajectories using flow-level models for reinforcement learning. Vidit Saxena, Joakim Jald´en, Henrik Klessig IEEE Transactions on Cognitive Communications and Networking (2019) II Wireless link adaptation with outdated CSI—a hybrid data-driven and model-based approach. Lissy Pellaco, Vidit Saxena, Mats Bengtsson, Joakim Jald´en 21st IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) (2020) III Spotnet–Learned iterations for cell detection in image-based immunoas- says. Pol del Aguila Pla, Vidit Saxena, Joakim Jald´en 16th International Symposium on Biomedical Imaging (ISBI) (2019) xiii
To my family.
Contents
Acknowledgement vii
Acronyms ix
List of Papers xi
Contents 1
1 Introduction 3 1.1 Contributions...... 5 1.2 Discussion...... 7 1.3 Organization ...... 8
2 Thesis Overview 9 2.1 Wireless Link Adaptation ...... 10 2.2 Neural Probability Estimation ...... 13 2.3 Multi-armed Bandits ...... 14 2.4 Thompson Sampling ...... 17
3 Summary of the Included Papers 21 3.1 PaperI ...... 21 3.2 PaperII...... 23 3.3 PaperIII ...... 24 3.4 PaperIV ...... 25 3.5 PaperV...... 26 3.6 Paper VI ...... 27
4 Conclusion 29 4.1 KeyTakeaways ...... 29 4.2 Research Directions ...... 30
References 33
Fulltext of the Included Papers 38
1
Chapter 1
Introduction
The wireless physical layer is highly configurable. Prominent wireless access pro- tocols, for example the cellular fifth-generation new radio (5G NR), and the IEEE 802.11 WiFi standards, provide hundreds of link parameter configurations that are tunable in real time. The wireless link can hence be adapted with a fine granularity to optimize performance in a variety of application scenarios. This extreme configurability allows wireless networks to provide robust and e cient access across diverse geographies, service demands, and usage patterns. The online tuning of wireless link parameters is called link adaptation. Link adaptation algorithms adjust the transmission parameters, for example the data transfer rate, to maximally utilize the channel resources and to combat stochas- tic impairments such as channel noise and interference. Most state-of-the-art link adaptation algorithms use simple, empirically configured, heuristics to dynami- cally adjust the data transmission parameters. While the existing heuristics can be beneficial in terms of fast, real-time, execution, they often do not fully exploit the available channel. Hence, as wireless networks grow in size and complexity, current link adaptation implementations can become increasingly suboptimal. Further, these implementations need to be tuned for good performance. Current wireless networks employ manual tuning based on empirical evidence, which can be both expensive and error-prone. Yet another drawback of the existing link adaptation schemes is that they do not adequately handle applications beyond the traditional voice telephony and mobile broadband services. As such, there is a need for powerful link adaptation schemes that address new services, for example real-time industrial control and vehicular communications. In this thesis, we address wireless link adaptation through machine learning. Our proposed solutions learn from the available information to e ciently navigate the link parameter space in dynamic wireless environments. Further, we advance link adaptation for complex performance objectives. Our approach thus provides better support for new wireless services. Finally, we also introduce mechanisms to autonomously tune link adaptation performance in diverse operating conditions,
3 4 CHAPTER 1. INTRODUCTION
which reduces the dependence on manual adjustments. A well-known disadvantage with machine learning is its dearth of interpretable models and learning algorithms. Several machine learning solutions are imple- mented in a black-box fashion with an incomplete understanding of their inner workings. While the need for interpretability is often neglected in the light of superior empirical performance, this nevertheless leads to solutions that are dif- ficult to extend to new use cases. Further, black-box machine learning solutions additionally do not provide any performance guarantees that may be critical for system and resource provisioning . In this thesis, we therefore develop the theoretical foundations for our pro- posed machine learning solutions. We approach this objective in three ways: First, we make reasoned choices for our machine learning models and learning algorithms. These choices are guided by in-depth knowledge of the specific prob- lem dynamics, as well as the interpretability of the candidate models. Second, we formulate certain link adaptation use cases in the form of rigorous, mathemati- cally tractable, problems. As an example, we characterize link reliability as an online convex optimization problem under linear constraints. Third, we develop theoretical bounds for the real-time performance of our proposed algorithms. In addition to link adaptation, these bounds are equally applicable to more general online optimization problems. The contributions of this thesis hence go beyond the realm of wireless optimization to broader machine learning applications.
Figure 1.1: Wireless systems operate in complex and time-varying environments. Today, these systems enable a broad range of applications beyond the traditional voice telephony and mobile broadband: for example, smart manufacturing, vehic- ular communications, and internet-of-sensors. 1.1. CONTRIBUTIONS 5
1.1 Contributions
This thesis addresses link adaptation in terms of finding the optimal data trans- mission rate over a wireless channel1. This thesis is composed of six technical papers that are summarized and reproduced in the subsequent chapters. Based on these papers, there are three key contributions of this thesis: a neural link adaptation model (Paper I, Paper II, and Paper III), link adaptation under error rate constraints (Paper IV and Paper V), and sample-e cient link adap- tation with subspace learning (Paper VI). Each of these three contributions are summarized next.
Neural Link Adaptation Model We pose link adaptation as the problem of predicting the channel-conditioned success probability for each available data transmission rate. As a consequence, we desire parameterized models that can learn the mapping from an arbitrary wireless channel state to the respective packet success probabilities. For this problem, we employ an artificial neural network (ANN) model, which is known to be powerful general-purpose function approximator. Our choice of an ANN model, however, is inspired by the following lesser-exploited property of ANNs: for a suitable choice of training loss, the ANN outputs estimate the true conditional probability for the target classes. As a consequence, the ANN outputs for a given channel state can be rigorously interpreted as the respective packet success probabilities for each available data transmission rate. We then use the trained ANN model for optimal data rate selection in wireless links. Paper I and Paper II introduce supervised learning algorithms for our pro- posed neural link adaptation model. While Paper I addresses modulation and coding scheme (MCS) selection in cellular networks, Paper II deals with select- ing the optimal codebook in spatial modulation systems. Paper III extends the approach of Paper I to a reinforcement learning (RL) setting, where the optimal MCS is learnt online by sequentially exploring and exploiting the space of all available MCSs.
Link Adaptation under Error Rate Constraints Several real-time wireless applications require reliable channel access to maintain an acceptable quality of service (QoS). Examples of these applications include video streaming and real-time industrial control. This link reliability metric is commonly expressed in terms of an upper limit on the packet error rate expe- rienced by the link. For this problem, we formulate link adaptation in terms of maximizing the link throughput under a linear constraint on the packet error rate. We model this problem as a multi-armed bandit (MAB), where the optimal rates
1This problem is referenced in the literature through several terms, viz. adaptive modulation and coding (AMC), modulation and coding scheme (MCS) selection, and rate sampling. 6 CHAPTER 1. INTRODUCTION
are learnt by choosing the arm to played in every time interval. Further, we pro- pose a Bayesian learning algorithm to optimize the link adaptation MAB for the link reliability objective. Our learning algorithm thus extends the state-of-the-art Thompson sampling heuristic to the constrained optimization setting. Paper IV proposes a constrained Thompson sampling algorithm for link adaptation under packet error rate constraints. Paper V develops theoretical upper bounds on the performance of constrained Thompson sampling, in terms of both the reward maximization and the constraint satisfaction metrics.
Link Adaptation with Latent Models Link parameter configurations exhibit a high degree of structure. For example, packet failure at a certain transmission rate indicates that higher rates are also likely to fail for given wireless channel conditions, and vice versa for lower rates. This structural property can be exploited for sample e cient RL, by reducing the number of exploratory transmissions required to learn the optimal rate. In particular, the dependence between rates can be modeled parsimoniously in a latent subspace. We first identify one such low-dimensional subspace suitable for RL-based link adaptation. Subsequently, we propose an extension to Thompson sampling that exploits this latent subspace for sample-e cient learning. Paper VI proposes a reinforcement learning link adaptation algorithm that learns a latent signal-to-noise-and-interference (SINR) model to predict the opti- mal data transmission rate.
In terms of impact, this thesis advances the state of the art in link adaptation in three key ways: improved link spectral e ciency, support for complex QoS met- rics, and autonomous calibration. First, improving the link spectral e ciency would allow better utilization of the scarce and expensive wireless resource. This would lead to higher tra c volumes being served with the bandwidth slice as- signed to a wireless access provider. While all the composing papers make better use of the available spectral resources, Paper I, Paper II, and Paper VI ex- plicitly optimize for the average link throughput. Second, link optimization for complex QoS metrics would make it possible to extend wireless access to new application domains. An example would be to constrain the link error rate below a maximum allowed level, which is essential for robust real-time control in in- dustrial and vehicular applications. These and many other application domains are expected to fuel the next generation of wireless growth. Paper III, Paper IV, and Paper V specifically address this use case. Finally, current link adap- tation parameters are experientially selected and seldom updated in response to changes in the ambient wireless environment or tra c patterns. The dependence on manual, on-field, network maintenance is hence expensive and prone to errors. Autonomous calibration would improve network operations by self-tuning link 1.2. DISCUSSION 7
adaptation algorithms across diverse deployments. Paper III and Paper IV serve this autonomy goal by exploiting contextual information commonly avail- able in existing networks. Contextual information allows faster, link-agnostic learning from multiple parallel data flows. Paper VI additionally proposes a scheme that autonomously tunes the link adaptation parameters to optimally track the channel variations.
1.2 Discussion
Link adaptation has been deployed in live networks for the past several decades. However, despite their widespread adoption, link adaptation schemes have not been subject to significant updates. The reason for this is twofold: hardware constraints and legacy issues. Wireless hardware has been severely resource- constrained to minimize the capital and operating costs. However, this is quickly changing with the expanding potential of wireless services and the emergence of edge computing. Access to upgraded computing resources will allow sophisti- cated physical layer algorithms, of which link adaptation is one example, to be implemented in wireless networks. Secondly, despite their known shortcomings, legacy link adaptation schemes are typically “carried over” to the next generation of wireless deployments owing to their familiarity. This is also likely to change in the near future as wireless access expands to new service areas where legacy schemes might be insu cient and where alternative schemes promise large gains. The topic of link adaptation is hence open for innovation, and promises valuable gains from its advancement in future wireless networks. Cellular versus WiFi. Link adaptation has been implemented in the con- text of cellular as well as IEEE 802.11 (WiFi) networks. However, the termi- nology and the general approach adopted for implementation di↵ers between the two protocols. While cellular link adaptation makes substantial use of channel measurements, such techniques have found limited application in WiFi. Further, both cellular and WiFi links adapt the data transmission rate based on the out- come for previous packetized transmissions. However, while cellular links adopt a model-based approach for iterating towards the optimal rate, WiFi searches over the rates by sampling the available rates sequentially. In terms of configurability, cellular links typically provide more parameter choices than WiFi, including a higher number of available data rates. In this thesis, we evaluate our proposed schemes primarily in the context of downlink data transmission in cellular net- works. However, these techniques are equally applicable to WiFi links. In most of the included papers, we hence also benchmark our results against state-of-the-art WiFi link adaptation algorithms. 8 CHAPTER 1. INTRODUCTION
1.3 Organization
The rest of this thesis is structured as follows. Chapter 2 presents an overview of this thesis. First, in this chapter, the wireless link adaptation problem is introduced. Subsequently, this chapter summarizes some of the key concepts that form the basis of the technical contributions of this thesis. Next, Chapter 3 sequentially summarizes each of the papers included in this thesis. Chapter 4 concludes the thesis with a summary of the key takeaways and the discussion of some future research directions. Links to camera-ready versions of the included papers are provided towards the end of this document. Chapter 2
Thesis Overview
Radio-based wireless communications can be traced back to the late nineteenth century, when the first successful demonstrations of this technique were made [1]. However, for more than a century, wireless was limited to a fairly small set of applications: public broadcasts via radio, short-distance links and point-to-point telegraph, and specialized military equipment. The reason for this limited use was that radio hardware was bulky, expensive, and required a large amount of energy to operate. Hence, these devices could hardly be made mobile for general- purpose communication. Until the late twentieth century, the only truly mobile communication systems were car-based radios, and even those served only low- fidelity voice telephony. The digital explosion towards the end of the twentieth century, when general- purpose computing devices started becoming commonly available, sparked a con- current revolution in wireless communication technologies. During this period, handheld wireless devices became feasible owing to the small form factor and energy e ciency of high-performance chipsets. In addition, wirelessly delivered services expanded beyond voice telephony to include mobile broadband that al- lowed access to a rapidly-growing internet ecosystem. Today, over 90% of the global population enjoys a subscription to cellular mobile services [2]. The lat- est generation of wireless technologies seeks to extend connectivity to tens of billions of devices and serve a multitude of new applications in the transport, manufacturing, and allied sectors [2, 3]. The rest of this chapter is organized as follows: Section 2.1 discusses the link adaptation problem and provides an overview of the existing approaches as well as some related problems encountered in other domains. Next, Section 2.2 highlights an important property of ANNs that allows robust and interpretable models for link adaptation. Section 2.3 serves as an introduction to the powerful MAB framework, which is used for RL-based link adaptation. Finally, Section 2.4 discusses a Bayesian heuristic for MAB optimization, Thompson sampling, which has been adopted and extended for link adaptation in this thesis.
9 10 CHAPTER 2. THESIS OVERVIEW
2.1 Wireless Link Adaptation
Wireless data communication is a complex phenomenon. The bulk of this com- plexity is attributed to the stochastic and time-varying nature of the wireless channel, which stems from complex interactions between the data-carrying radio waves and physical objects in the signal path [1]. Since the instantaneous wireless channel state is not ordinarily available at the transmitter, wireless systems must devise mechanisms to e ciently navigate the channel. Practical wireless networks address this challenge by making the wireless links configurable, where the data transmission parameters can be selected from a set of pre-defined discrete val- ues for optimal predicted performance. Discretizing the parameter space in this manner serves two goals: first, the link can search through the parameters in a su ciently small time before the channel state changes appreciably and second, the selected parameters can, with a relatively small overhead, be communicated to the receiver for decoding. In modern cellular networks, the typical link selects from a few hundred possible parameter configurations once every few millisec- onds. Hence, the key challenge in these networks is to quickly and e ciently navigate the link parameter configuration space for optimal performance. Wireless link adaptation deals with the problem of tuning the data transmis- sion parameters to maximize the utility of a wireless channel [4–7]. Link adapta- tion techniques can be classified into two categories: inner loop (or closed loop), and outer loop (or open loop), respectively. Inner loop link adaptation (ILLA) makes use of explicit channel estimates [8]. These channel estimates are measured at the receiver using known pilot signals, and are subsequently fed back to the transmitter for rate selection. ILLA hence incurs significant overhead for pilot signaling, receiver-side measurements, and channel state feedback. In contrast to the inner loop, outer loop link adaptation (OLLA) does not involve any channel measurement. Instead, this loop adjusts the transmission parameters based solely on the observed outcome of previous transmissions. If one or more previous pack- ets were decoded successfully at the receiver (indicated by an acknowledgement (ACK) feedback signal), OLLA moves up its estimate of the wireless channel utility. On the other hand, if too many packets fail to be decoded, the outer loop falls back to more conservative data transmission rates [7]. The inner and outer link adaptation loops are complementary to each other. While the inner loop is more responsive, it does so at the cost of high channel reporting overheads. Conversely, the outer loop does not accrue any signaling overhead, but it is slow to respond to large channel variations. Wireless systems hence deploy both loops, albeit at di↵erent timescales: the inner loop, which compensates substantial channel movements, is only triggered infrequently to minimize overhead. On the other hand, the outer loop adjustments are smaller in magnitude but also more frequent – an outer loop update is typically exe- cuted after every packet transmission. A suitably configured pair of inner and outer loops allows the wireless network to optimize the link in diverse operating environments. 2.1. WIRELESS LINK ADAPTATION 11
Paper I and Paper II included in this thesis address ILLA for throughput maximization. Further, Paper III, Paper IV, and Paper V optimize OLLA for the more complex link objective of throughput maximization under an error rate constraint. Finally, Paper VI revisits OLLA for throughput maximization, where an updated learning model is exploited for fast and e cient link adaptation.
Link Adaptation Schemes Interest into link adaptation schemes goes back several decades, coinciding with the inception of wide-area and cellular wireless networks [9]. Since then, several link adaptation schemes have been proposed in the literature. In the early years of link adaptation research, the focus was on ILLA in terms of accurately and compactly characterizing the wireless channel state. Owing to is explicit signaling requirements, ILLA is strongly regulated by the respective wireless standard. In contrast, the outer loop flexibly uses one or more of the existing control signals for adaptation. Subsequently, starting from the third-generation (3G) cellular networks, OLLA has also been studied extensively. Concurrently with cellular networks, link adaptation schemes have also been proposed in the context of wireless local area networks such as the IEEE 802.11 (WiFi) standard. In the rest of this section, we will highlight some prominent developments on this topic. ILLA: A robust mechanism to compresses the high-dimensional wireless chan- nel state to a scalar metric was proposed in [10]. This metric, known as the ef- fective signal-to-interference-and-noise-ratio (SINR), improves signaling e ciency and hence has been enthusiastically adopted and extended by later cellular stan- dards [11,12]. However, compressing the channel state is inevitably lossy. Hence, the e↵ective SINR approach su↵ers from link performance loss owing to sub- optimal parameter configuration. In [13, 14], a supervised learning approach based on K-nearest neighbors was proposed, which directly maps from high- dimensional wireless channel state to the optimal transmission parameters. This scheme was shown to outperform an e↵ective SINR scheme in terms of the aver- age link throughput. Despite its empirical gains, the model in [13] was di cult to scale and not amenable to a theoretical interpretation. Paper I included in this thesis provides an enhanced, ANN-based, supervised learning model that both scales well with the channel dimensionality and where the model outputs are rigorously interpreted as the respective packet success probabilities. Paper II extends this approach to the related problem of selecting an optimal codebook in spatial modulation systems. Legacy OLLA: One of the earliest OLLA schemes, which relies on ACK feedback, was proposed in [15]. This scheme proposes maintaining an o↵set to the ILLA e↵ective SINR, which is adjusted on a per-packet basis. If the previ- ous transmission was successful (i.e., an ACK was received), OLLA increases its SINR estimate by a configurable amount. Otherwise, if a negative ACK (NACK) is received, OLLA decreases the SINR amount proportionally. Several drawbacks of this simple scheme are known, which is why ad-hoc fixes have been proposed to 12 CHAPTER 2. THESIS OVERVIEW
address one or more of its shortcomings [15–18]. However, somewhat surprisingly, the basic OLLA heuristic in [15] has remained in operation in cellular networks for the past two decades with minimal changes. In contrast to cellular imple- mentations, WiFi OLLA schemes networks generally do not involve the SINR metric. Instead, WiFi OLLA schemes heuristically switch between data trans- mission rates based on the statistical ACK/NACK behaviour over a moving time window [19,20].
Reinforcement Learning (RL) for OLLA: A key characteristic of OLLA is that the ACK feedback corresponds only to the selected data transmission rate, and does not provide direct information about other rates. Hence, to find the optimal rate, OLLA needs an e cient mechanism for exploring the available rates. Previous schemes handle exploration by sequentially probing the available rates in the order of their spectral capacities. However, this approach can be suboptimal when the number of rates is large, or when the channel variations are frequent. RL is an alternative, principled, approach that deals with online exploration [21]. RL has recently been proposed for OLLA in the context of cellular as well as WiFi deployments. Many of these RL OLLA schemes employ ANNs to model the link behavior in real time. A few ways in which ANN-based RL OLLA schemes improve link adaptation performance are: by optimizing legacy OLLA tuning parameters [22], learning dependencies between the available rates [23], and exploiting high-dimensional channel contexts [24]. Paper III included in this thesis proposes and evaluates an ANN-based RL OLLA scheme for cellular networks, where the ANN model of Paper I has been extended to online link optimization.
Multi-armed Bandits (MAB): MABs encode a powerful RL framework to balance between exploration and exploitation within stochastic environments. MABs were first proposed for link adaptation in [25]. Their algorithm quickly maximizes the link throughput by exploiting structural properties inherent to the link adaptation problem. However, this schemes does not naturally extend to more complex link performance objectives encountered for several wireless appli- cations. In Paper IV, we propose a MAB optimization algorithm that incorpo- rates an average packer error rate constraint. In contrast to [25], which adopted a frequentist learning heuristic based on upper confidence bounds (UCB), we use a Bayesian heuristic based on Thompson sampling that typically provides better learning performance. We theoretically analyze our proposed constrained opti- mization algorithm in Paper V, where we obtain new results on its finite-time performance. In Paper VI included in this thesis, we revert to the problem of unconstrained throughput maximization. Our proposed MAB optimization algorithm in Paper VI learns in a lower-dimensional channel subspace to sub- stantially improve the link throughput compared to the previous schemes. 2.2. NEURAL PROBABILITY ESTIMATION 13
Connection to Other Domains The basic OLLA problem is formulated as learning the optimal transmission rates based on observed ACK/NACK feedback. This simple formulation is echoed by several RL problems in otherwise unrelated domains. Interestingly, although these related problems may have been studied for several decades, their connec- tion to link adaptation has largely been overlooked. We highlight some of these related problems below, which are commonly modeled as MAB instances:
Weblink selection: A webpage publisher seeks to place one or more a liate • weblinks to attract ad revenues [26, 27]. In each round, a user either clicks one of the displayed weblinks or does not click any weblink. The goal is to select weblinks that maximize the cumulative revenue generated from user clicks. In the context of link adaptation, a displayed weblink is analogous to a selected rate, a user’s click corresponds to an ACK/NACK, and the set of available rates corresponds to the set of weblinks available for display.
Dynamic pricing: A seller aims to maximize the cumulative revenue by • optimally pricing the available goods. In the absence of any contextual in- formation, the only feedback available to the seller is the successful sale of an individual item. This problem can be formulated as provisioning a set of discrete selling prices that are e ciently probed to determine the optimal, revenue-maximizing, price. Recently, latent models of the demand behavior have been employed to substantially speed up the learning of optimal pricing strategies.
Inventory management: Constrained MAB problems have recently been • studied in the context of revenue maximization under a finite inventory set- ting, termed bandits with knapsacks (BwK). In [28], an upper confidence bound (UCB)-based approach was introduced that was shown to be opti- mal for the stochastic BwK problem. Further in [26], a Thompson sampling algorithm for budgeted MABs was proposed that outperforms the UCB BwK algorithm. Subsequently, in [29], Thompson Sampling was studied for rev- enue optimization for a finite inventory that contains multiple non-identical products.
2.2 Neural Probability Estimation
In this section, we take a slight detour to highlight an important property of ANNs, which makes them particularly interesting for link adaptation modeling. Recall that link adaptation can be formulated as the problem of learning the ACK probabilities for an arbitrary channel state. Denoting the packet success feedback with ek Ek = 0, 1 ,whereek = 1 denotes an ACK event, ek =0 denotes a NACK event,2 and{ k } 1,...,K denotes the rate index, and denoting 2{ } 14 CHAPTER 2. THESIS OVERVIEW
the channel state vector by , we wish to model 2
PE (ek ;✓), k 1,...,K (2.1) k| | 8 2{ } where ✓ is the set of aprioriunknown parameters of the true ACK probability model. In other words, ✓ denotes the parameters of an oracle model that ac- curately encodes the conditional ACK probabilities as a function of the wireless channel state. Our goal is to learn, based on an observed channel states and their corresponding ACK/NACK events, an approximate model for the conditional
ACK probability PEk (ek ;✓), where ✓ denotes the parameters of our learnt model. | | An ANN model for link adaptation,b bf( ): ⇢NN, maps the channel state NN NN NN 7! to the ANN output ⇢ =[⇢1 ,...,⇢K ] through a nonlinear transformation. For the sake of exposition, we assume that the kth ANN output corresponds to the kth data transmission rate. The ANN is trained with a training dataset (1) (1) (N) (N) of (channel state, ACK) tuples, ( ,ek ),...,( ,ek ) collected using an arbitrary rate selection scheme. We repeat a following key result for ANNs, first identified in [30]: The ANN outputs, when trained to minimize the cross entropy loss or the mean squared loss with respect to the observed ACK events, provide maximum-likelihood estimates of the true channel-conditional ACK probabilities in the limit of infinitely many training samples. The significance of this result is that for a su ciently large training dataset, the ANN outputs can be rigorously interpreted as ACK probability estimates. As a consequence, ANN models can be used to optimize not only for the link throughput, but also for more complex link performance objectives, for example ones that take packet error rates into accounts. Paper I and Paper II use this ANN property for link adaptation, where supervised training of ANN is performed o✏ine. Paper III extends this ANN model to an reinforcement learning setting, where the training data is collected online through an epsilon-greedy policy.
2.3 Multi-armed Bandits
In the typical MAB setting, a decision-maker (that is, the agent) has access to a set of discrete or continuous-valued actions (arms) within an environment.The experiment is divided into sequential rounds, where the agent is allowed to pull one or more arm in every round. Pulling an arm corresponds to executing the re- spective action in the environment. The environment generates a reward for each pull of an arm, where the reward distribution is not explicitly made available to the agent at any stage. However, the agent may estimate reward characteristics by exploring the available arms over successive rounds. Subsequently, by exploiting these estimates, the agent can predict the arm (or set of arms) that optimizes a target performance objective. The central challenge in MAB optimization relates to balancing between exploration and exploitation, that is, identifying techniques 2.3. MULTI-ARMED BANDITS 15
that optimally explore the available arms to quickly find the best exploitative arms for reward maximization. In the context of RL based link adaptation, MABs are an attractive modeling framework. Here, the transmitter acts as an agent to optimize the link perfor- mance within the wireless environment. The wireless environment induces an a priori unknown distribution over the ACK probability for each available rate. The transmitter has access to a finite set of discrete-valued data transmission rates, r1,...,K , exactly one of which is selected for packet transmission in every transmission interval. The environment responds with an ACK or a NACK, which encodes the reward collected by the agent in that round. The transmitter hence needs to e ciently explore the available rates and predict the optimal rate, or set of rates, that can be exploited to optimize the link performance. A discussion on MABs in general, and their application to various domains, is available in [31]. In the rest of this section, we will describe MABs as applied to link adaptation prob- lems – in terms of, respectively, the specific reward structure, wireless channel dynamics, MAB optimization algorithms, and conclude with a note on general performance bounds for MAB optimization.
Reward Structure The wireless transmitter sends packetized data, which, if decoded successfully, delivers the entire packet contents to the intended receiver. On the other hand, if the decoding was unsuccessful, zero data bits are delivered. The decoding
Figure 2.1: The multi-armed bandit formulation is classically attributed to a slot machine with multiple arms that generate rewards with an unknown reward dis- tribution. The agent plays the arms with the goal of e ciently maximizing his cumulative returns over several, sequential, rounds. 16 CHAPTER 2. THESIS OVERVIEW
outcome gets encoded as the binary ACK/NACK signal ck[t][t], where k[t]is the rate index selected at the discrete transmission time intervals t =1, 2,.... We first assume a stationary wireless channel such that the channel-conditional ACK probability is only a function of the data transmission rate, rk[t]. Later in this section, we will extend the analysis to address non-stationary channels. For the stationary channel, the binary random variable ck[t][t] is independent and identically distributed according to a Bernoulli distribution with mean ⇢k[t][t]= E[ck[t][t]], where the expectation is taken over multiple packet transmissions for an arbitrary channel state. The wireless transmitter uses the historical ACK/NACK rewards to estimate ACK probabilities for the next time interval, denoted by [⇢1[t + 1],...,⇢K [t + 1]]. Subsequently, for the common goal of link throughput maximization, the trans- mitter selects the rate index by post-processing the predictedb ACK probabilities,b
k⇤[t + 1] = argmax rk⇢k[t + 1], (2.2) k 1,...,K 2{ } that is, the predicted rate index that maximizesb the expected link throughput. This thesis also considers more complex link performance objectives, for example where the average packet error rate is constrained below a certain threshold. For such objectives, an appropriate post-processing step is executed for rate selection in every time interval.
Environment Dynamics Wireless environments typically evolve over time, owing to the physical motion of the objects within the environment. However, in certain scenarios, for example indoor WiFi deployments, the wireless channel may evolve slowly such that it can be considered approximately stationary over the duration of a typical data session. For these scenarios, the MAB formulation above adequately optimizes for an arbitrary channel state. In other scenarios, where the channel varies appreciably within the timeframe of a single data session, the MAB model needs to be adapted for channel dynamicity. This thesis proposes two ways of addressing channel dynamicity: channel state conditioning and forgetting factors. With channel state conditioning, the MAB model learns the ACK probabilities conditioned on periodically reported channel quality indicators (CQI). The channel is assumed to be quasi-stationary between successive CQI reports. Further, for a given CQI, the channel state is assumed to be approximately constant. The channel-conditional ACK probability can hence be learnt over successive transmissions in a dynamic channel. Alternatively, a forgetting factor can also be employed for addressing channel variations, where the historical ACK signal is weighted inversely with the time gap from the current transmission time. A specific instance of forgetting factor, which uses a sliding time window heuristic, has been proposed in [25] for optimization under nonstationary wireless channels. 2.4. THOMPSON SAMPLING 17
Optimization Algorithms MAB optimization algorithms fall under two broad categories: frequentist and Bayesian. Frequentist MAB algorithms are rooted in empirical statistics that is employed to select an optimal arm in every round. One of the first e cient and provably optimal frequentist algorithm, UCB1, predicts the arm with the highest upper confidence bound (UCB) for reward maximization. In contrast to frequen- tist optimization schemes, Bayesian MAB optimization have been proposed that assign a degree of belief to the true reward parameters. This belief is updated at the beginning of every round based on the previously observed rewards. The most common Bayesian heuristic, Thompson sampling, assigns an initial prior distribution over the reward parameters. In every round, Thompson sampling updates its belief by computing a new prior in every round based on the collected rewards in all previous rounds. This updated prior then is used to predict the optimal arm that maximizes the expected reward. There are several reasons that motivate Thompson sampling for MAB-based link adaptation. First, several models of the wireless system are available, which are grounded in expert knowledge and verified through decades of experimental studies. This motivates the use of a Thompson sampling approach, which can incorporate model-guided knowledge into the prior beliefs of reward distribution. Additionally, Thompson sampling has recently been shown to be asymptotically optimal for reward maximization, and additional performance results are also be- ing made available at a fast pace [29, 32]. Finally, Thompson sampling is known to outperform UCB-based approaches for a large number of empirical bench- marks [33]. Hence, this thesis adopts Thompson sampling as the optimization algorithm of choice for the link adaptation problem. In the next section, we will introduce Thompson sampling, and describe its extensions proposed in this thesis for various link adaptation formulations.
2.4 Thompson Sampling
Thompson sampling, also known as posterior matching, is the state-of-the-art Bayesian heuristic for MAB optimization. First proposed in 1933 [34], Thomp- son sampling has recently found a resurgence in interest owing to the empirical evidence of its superior performance [33] and rigorous theoretical bounds on its finite-time performance [32, 35]. Thompson sampling maintaining a prior belief over the reward parameters, which is updated based on the rewards collected in the previous rounds. Subsequently, in every round, Thompson sampling estimates the per-arm reward by sampling from the associated prior. This sampling step naturally balances between exploration, by sometimes sampling actions that have a uncertain rewards, and exploitation, by choosing the actions with the highest predicted reward at other times. In the context of link adaptation, the ACK events are Bernoulli-distributed with a mean given by the true ACK probability, ⇢ , for each rate index k k 2 18 CHAPTER 2. THESIS OVERVIEW
1,...,K . Since the ACK probabilities are a priori unknown, Thompson sam- pling{ models} them by assigning a Beta prior distribution over the ACK prob- ability, B(↵k, k), where ↵k, k are the distribution parameters. The choice of a Beta distribution is motivated by it conjugacy to the Bernoulli distribution. This conjugacy property greatly simplifies the calculation of the prior in every round. Hence, at every time step t =1, 2,..., Thompson sampling computes the prior distribution parameters based on the ACK/NACKs obtained in previous transmission intervals,
↵k[t]= 1+ek[i], i