Federated Learning Over Wireless Networks: Optimization Model Design and Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Federated Learning over Wireless Networks: Optimization Model Design and Analysis Nguyen H. Tran, Wei Bao, Albert Zomaya Minh N.H. Nguyen, Choong Seon Hong School of Computer Science Department of Computer Science and Engineering The University of Sydney Kyung Hee University Sydney, Australia Yongin, South Korea fnguyen.tran, wei.bao, [email protected] fminhnhn, [email protected] Abstract—There is an increasing interest in a new machine In the scope of machine learning, there is a dilemma that learning technique called Federated Learning, in which the model UEs’ local data are valuable for training models, which will training is distributed over mobile user equipments (UEs), and greatly increase user experiences; however, these data are also each UE contributes to the learning model by independently computing the gradient based on its local training data. Federated privacy sensitive in nature so that it is risky to log the data Learning has several benefits of data privacy and potentially to datacenters for model training. For example, location-based a large amount of UE participants with modern powerful services such as the app Waze [5], can help users avoid heavy- processors and low-delay mobile-edge networks. While most of traffic roads and thus reduce the congestion. However, in this the existing work focused on designing learning algorithms with application, users have to share their own locations to the provable convergence time, other issues such as uncertainty of wireless channels and UEs with heterogeneous power con- server. Federated Learning is a great solution to address such straints and local data size, are under-explored. These issues dilemma: instead of sending the raw data, only intermediate especially affect to various trade-offs: (i) between computation gradient values are sent. Furthermore, the modern smart UE and communication latencies determined by learning accuracy can be considered as a personal computer with powerful level, and thus (ii) between the Federated Learning time and integrated processors (e.g., Hexagon DSP with Qualcomm UE energy consumption. We fill this gap by formulating a Federated Learning over wireless network as an optimization Hexagon Vector eXtensions on Snapdragon 835 [4]) for heavy problem FEDL that captures both trade-offs. Even though FEDL computing tasks, and plethora of sensors (e.g., cameras, mi- is non-convex, we exploit the problem structure to decompose crophones, and GPS) for collecting a wealth amount of data, and transform it to three convex sub-problems. We also obtain which ensures the feasibility of Federated Learning to foster the globally optimal solution by charactering the closed-form more intelligent applications. solutions to all sub-problems, which give qualitative insights to problem design via the obtained optimal FEDL learning time, There are many reasons to support Federated Learning. accuracy level, and UE energy cost. Our theoretical analysis is First, with the recent advances of edge computing, Federated also illustrated by extensive numerical results. Learning can be easily implemented in reality. Equipped Index Terms—Distributed Machine Learning over Wireless with a large amount of computing resources at the edge, a Networks, Federated Learning, Optimization Decomposition. centralized powerful datacenter is no longer a must. Instead, the model training can be completed in a distributed fashion, I. INTRODUCTION and the delay of uploading a huge amount of raw data can be reduced. Second, Federated Learning greatly facilitates an The rise in interest in maintaining the privacy of consumer unprecedented large-scale flexible data collection and model data [1] has led to the emergence of a new class of machine training. For example, a crowd of smart devices can pro- learning techniques that exploit the participation of a number actively sense and collect data during the day hours, then of mobile phone users. One such popular technique is called they jointly feedback and update the global model during Federated Learning [2]–[4]. This learning technique allows the the night hours, to improve the efficiency and accuracy for users to collaboratively build a shared learning model while next-day usage. We envision that such approach will boost a preserving all training data on their own user equipment (UE). new generation of smart services, such as smart transportation, In particular, a UE computes the updates to the current global smart shopping, and smart hospital. model on its local training data, which is then aggregated and With all the promising benefits, Federated Learning also fed-back by a central server, so that all UEs have access to comes with new challenges to tackle. On one hand, similar the same global model in order to compute their new updates. to other machine learning schemes, one of the most critical This process is repeated until an accuracy level of the learning performance metrics of Federated Learning is the learning model is reached. By this way, the user data privacy is well time it takes to converge to a predefined accuracy level. protected because local training data are not shared, thus it However, different from conventional machine learning ap- decouples the machine learning from acquiring, storing, and proaches, Federated Learning time includes not only the UE training data in datacenters as conventional approaches. computation time (which depend on UEs’ CPU types and local data sizes), but also the communication time of all UEs (which unstable in Internet connections, and have non-i.i.d. and unbal- depends on UE channel gains and update data size). Thus, anced data locally. However, most of these works focused on the first question is: whether UEs should spend more time designing algorithms to improve the convergence of learning on computation to achieve high learning accuracy and less time, unconcerned about other limiting factors such as wireless communication updates, or vice versa? On the other hand, due communication and energy-limited nature of mobile UEs that to the limited battery of the participants, how UE resources can affect the performance of Federated Learning. To address such as computing and transmission powers are allocated to this gap, we study how the computation and communication minimize the energy consumption is the main concern. Thus, characteristics of UEs can affect to their energy consumption, the second question is: how to strike a balance between two learning time convergence, and accuracy level of Federated conflicting goals of minimizing Federated Learning time and Learning, considering heterogeneous UEs in terms of data size, UE energy consumption? Furthermore, these two questions channel gain, computing and transmission power capabilities. cannot be answered separately due to their couplings. To address these questions, we provide first-of-its-kind III. SYSTEM MODEL “Federated Learning over Wireless Networks” problem design We consider a wireless multi-user system consisting of one and analysis, which can be summarized as follows: base station (BS) and a set N of N UEs. Each participating • We pose the Federated Learning over wireless networks UE n stores a local data set Dn, with its size is denoted by Dn. PN problem (FEDL) that capture two trade-offs: (i) learning Then, we can define the total data size by D = n=1 Dn. time versus UE energy consumption by using Pareto effi- In an example of the supervised learning setting, at UE n, ciency model, and (ii) computation versus communication Dn defines the collection of data samples given as a set of Dn d learning time by finding the optimal learning accuracy input-output pairs fxi; yigi=1, where xi 2 R is an input parameter (Section III.) sample vector with d features, and yi 2 R is the labeled output • Despite non-convex nature of FEDL, we exploit its value for the sample xi. The data can be generated through the special structure and use the variable decomposition usage of UE, for example, via interactions with mobile apps. approach to split and transform FEDL into three convex With these UEs data, several machine learning applications sub-problems. We show that the first two sub-problems can be employed for wireless networks such as predicting the can be solved separately, then their solutions are used to BS’s load in next hours for dynamic BS load balancing, or obtain the solution to the third sub-problem. By analysing predicting the next hovering position of drones so that their the closed-form solution to each sub-problem, we obtain coverage is optimized. qualitative insights into the impact of the Pareto-efficient In a typical learning problem, for a sample data fxi; yig controlling knob to the optimal: (i) computation and with input xi (e.g., the response time of various apps inside communication learning time, (ii) UE resource allocation, the UE), the task is to find the model parameter w that charac- and (iii) learning accuracy. Finally, the combined solution terizes the output yi (e.g., label of BS load, such as high or low, to all sub-problems can provide the globally optimal in next hours) with the loss function fi(w). Some examples solution to FEDL (Section IV.) 1 T 2 of the loss function are fi(w) = 2 (xi w − yi) ; yi 2 R for • T We further provide extensive numerical results to examine linear regression and fi(w) = f0; 1 − yixi wg; yi 2 {−1; 1g the: (i) impact of UE heterogeneity, (ii) Pareto curve for support vector machine. The loss function on the data set between UE energy cost and system learning time, and of UE n is defined as (iii) the impact of the proportion of computation over 1 X communication time on the optimal accuracy level (Sec- Jn(w) := fi(w): (1) D i2Dn tion V). Finally, we present the conclusions and some n possible future directions of Federated Learning over Then, the learning model is the minimizer of the following wireless networks (Section VI.) global loss function minimization problem XN Dn II.