Tips and Tricks for Power Line Communications

Andrea M. Tonello, Senior Member, IEEE, Nunzio A. Letizia, Davide Righini and Francesco Marcuzzi

Abstract—A great deal of attention has been recently given to afterwards with an empirical and top-down approach. The Machine Learning (ML) techniques in many different application immense contributions of I. Kant with his philosophy of fields. This paper provides a vision of what ML can do in Power transcendental aesthetics and logic [2], made a step further into Line Communications (PLC). We firstly and briefly describe classical formulations of ML, and distinguish deterministic from the understanding and the definition of knowledge: knowledge statistical learning models with relevance to communications. We of the structure of time and space and their relationships, is a then discuss ML applications in PLC for each layer, namely, for priori knowledge; knowledge acquired from observations is a characterization and modeling, for the development of physical posteriori knowledge; most of our knowledge comes from the layer algorithms, for media access control and networking. process of learning and observing phenomena, and without a Finally, other applications of PLC that can benefit from the usage of ML, as grid diagnostics, are analyzed. Illustrative numerical priori knowledge it is impossible to reach the true knowledge. examples are reported to serve the purpose of validating the In this respect, Machine Learning (ML) [3], [4], which is ideas and motivate future research endeavors in this stimulating the topic of this paper, can be considered an implementation signal/data processing field. by humans of techniques in machines to acquire knowledge Index Terms—Machine learning, Statistical learning, Com- from a posteriori observations of natural phenomena. The munications, Power line communications, Channel modeling, Physical layer, MAC layer, Network layer, Grid diagnostics. origin of success of ML relies on its ability to derive relations among phenomena and potentially discover the hidden (latent) true I.INTRODUCTION state of a system, i.e, potentially provide an intrinsic knowledge of the system. System identification and model Modern communication systems have reached a high degree based design through the aid of ML [5] constitute a first step of performance, meeting demanding requirements in numerous to find undiscovered system properties via a mixed a priori - application fields. Significant progress in the analysis and de- a posteriori learning approach, which, retrospectively, follows sign of communication systems has been rendered possible by Kant’s philosophical structure. the milestone work of C. Shannon [1] that provided a method- ML is indeed bringing new lymph in the domain of commu- ological approach to attack the challenge of reliably trans- nication systems modeling, design, optimization, and manage- mitting information through a given communication mean. ment. It provides a paradigm shift: rather than concentrating Shannon’s mathematical approach suggests to represent the on a physical bottom-up description of the communication system as a chain of blocks mathematically modeled, namely, scheme, ML aims to learn and capture information from a the transmitter, the channel, and the receiver. The transmitter collection of data, to derive the input-output relations of the can be further divided into a source coder, a channel coder and system, or of a given task in the system. We would argue a signal modulator. The channel is represented by a transfer that a miraculous solution of communications challenges with function (in most cases considered linear time invariant, or ML does not yet exist. In addition, what is learned via ML time variant) and an additive noise term. Three generations tools is not necessarily representative of the physical reality, of scientists and engineers grew up with this mathematical i.e., wrong believes about relations and dependencies among mindset which provided tools to acquire domain knowledge data may be generated. Consequently, the results have to be and use it to build a model for each block, so that the overall validated through the support of a probabilistic approach and arXiv:1904.11949v2 [eess.SP] 6 Jun 2019 behavior becomes known. Such a framework intrinsically has an understanding of the system physics. But the path has been the advantage that each block can be individually studied and mapped out: ML offers a great deal of opportunities to research optimized. We would refer to this approach as physical and and design communication systems. bottom-up. From an epistemological point of view, the mathematical A. But what are the domains of application of ML in commu- theory of communications is based on knowledge coming nications? from a priori justifications and relying on intuitions and The applications of ML in communications are a multitude the nature of these intuitions, which is intrinsically what and cover all three fundamental protocol stacks: the physical mathematics does. On the contrary, a posteriori knowledge is layer, the MAC layer and the network layer. More specifically, created by what is known from experience, therefore generated the same applies to Power Line Communications (PLC) which is the technology that exploits the existing power deliver The authors are with the University of Klagenfurt - Chair of Embedded Communication Systems, 9020 Klagenfurt, Austria. (e-mail: {andrea.tonello, infrastructure to convey information signals [6]. When things nunzio.letizia, davide.righini, francesco.marcuzzi}@aau.at) become complex and a bottom-up model is difficult to derive

1 or has too many uncertainties, ML can help, no doubt. This is more in general system identification, data detection, resource particularly true in PLC since, despite the advances in channel allocation, network management etc. and noise modeling [7], the communication media is still In the following, we summarize learning methods and tools not fully understood and modeled especially when it comes and we report specific examples of ML based solutions already to noise and interference. Consequently, the transceiver tech- proposed in the literature using however a unified description niques designed so far might not be optimal [8]. Media access approach. control and resource allocation in massive PLC networks (as smart metering ones) is an extremely complex task that can A. benefit from ML approaches [9]. The analysis of PLC signals 1) Preliminaries and Definitions: Let (xi, yi) ∼ p(x, y), exchanged among nodes can reveal properties of the grid status i = 1,...,N, be samples collected into a training set D and detect anomalies in the cables, loads and generators, which belonging to the joint distribution (pdf) p(x, y). Supervised is relevant for grid predictive maintenance [10], [11]. learning, under a deterministic model, aims to find a mapping (x , y ) B. Paper Contribution between all pairs of input-output vectors i i , thus, an inferred function F that element-wise satisfies y = F (x). The In this paper, we will discuss in detail the application of ideal scenario would map unseen samples ˜x into the right, ML in PLC providing concrete insights (tips) of what can initially unknown, label/target ˜y. As we want to address the be done and with what ML tools (tricks). Several numerical problem under a stochastic/probabilistic approach, we state examples are reported to validate the ideas and to stimulate that probabilistic supervised learning tries to predict y from x further work in this research domain. To better understand by estimating p(y|x) under a discriminative model or by esti- ML, we start our short journey into ML for PLC by proving mating the joint distribution p(x, y) under a generative model. a compact introduction to ML with focus on applications in Fig. 1 schematically distinguishes between deterministic and communications. This serves also the purpose of surveying, at probabilistic approaches in ML. the best of the authors knowledge, the existing literature on If the outputs are continuous variables, we consider it as the topic and the initial studies conducted. a regression problem, while if the targets are discrete, then In detail, the paper is organized as follows. In Sec. II, we have a classification problem. A standard way to proceed ML fundamentals for both supervised and unsupervised learn- during the learning process is to define a cost function C, ing are reported. Specific tools are described. They include namely a performance measure that evaluates the quality of artificial neural networks, and support vector machines (for our prediction ˆy. In most applications, we can rely only on the supervised ML), and clustering, , and generative observed dataset D and derive an empirical sample distribution networks (for unsupervised ML). Convolutional and Recurrent since we do not have knowledge of the true joint distribution Neural Networks as well as are also p(x, y). In particular, the goal of the training process is to briefly discussed. In Sec. III, we focus on PLC and ML for the minimize characterization of the medium and its modeling through the C(ˆy) = [δ(y, ˆy)] (1) use of a data driven, synthetic approach. In Sec. IV, ML for E(x,y)∼D physical layer PLC is discussed while the MAC and network where δ is a measure of distance between the wanted target y layers are considered in Sec. V. Other applications of ML for and the prediction ˆy, and E denotes expectation. PLC are the topic of Sec. VI. The conclusions then follow. 2) Tools: Neural Networks: Neural Networks (NNs) are among the most popular tools in this field since they are II.MACHINE LEARNING BASICSFOR COMMUNICATIONS known being universal function approximators [14], they can Following the definition provided by Mitchell [12], ML al- be implemented in parallel on concurrent architectures and gorithms can be categorized according to the learning process most importantly, they can be trained by [15]. (the kind of experience E the machine has), the specific task A feedforward neural network with L layers maps a given T and the performance measure P . If we focus on the expe- D0 DL input x0 ∈ R to an output xL ∈ R by implementing rience, learning is divided into supervised, unsupervised and a function F (x0; θ) where θ represents the parameters of the reinforcement learning. According to different interpretations NN. To do so, the input is processed through L iterative steps of the experience, one could distinguish between a determin- istic or probabilistic interpretation/description of the system. xl = fl(xl−1; θl), l = 1,...,L (2) Since in communication theory, signals are always studied as stochastic processes, we prefer to encapsulate ML tools into where fl(xl−1; θl) maps the input of the l-th layer to its statistical learning tools, thus, under a stochastic/probabilistic output. The most used layer is the fully connected one, whose framework. mapping is expressed as Now, as discussed in the introduction, the paradigm shift in fl(xl−1; θl) = σ(Wl · xl−1 + bl) (3) ML for communications is to frame the problem into an initial black-box description of the system and then learn and capture where σ(·) is the activation function while Wl and bl are the information from a collection of data so that a target task can parameters, weights and the biases, respectively. According to be realized [13]. Examples of tasks, are system modeling, or the specific application, several different types of layers and

2 Learning Models

(x,y) (x) Supervised Unsupervised

Deterministic Probabilistic Deterministic Probabilistic

Discriminative Generative Discriminative Generative Autoencoders

P(x|z;ϑ) P(z|x;ϑ) y = F(x;ϑ) P(y|x;ϑ) P(x,y|ϑ) z = F(x;ϑ) P(z|x;ϑ) or and P(x,z|ϑ) P(x|z;ϑ)

Fig. 1: Taxonomy of learning models.

the observed data D, formally Input Layer Hidden Layer Hidden Layer Output Layer θ = arg max p(y|x; θ). (4) θ θ 1 θ L θ σ(·) 2 σ(·) From the dataset samples and using the log-likelihood func- σ(·) σ(·) tion, (4) can be empirically estimated as N σ(·) X σ(·) θ = arg max log p(yi|xi; θ). (5) ...... θ i=1 ... σ(·) σ(·) ... Note that the deterministic and probabilistic approaches col- W2x1+b2 W x +b lapse together when we consider Gaussian or categorical W1x0+b1 x1 x2 L 2 L models since we will end up solving a least squares or a cross- x0 xL entropy minimization problem, respectively. We leave the description of generative models for Sec. II-B since they are typically and formally introduced under , Fig. 2: Structure of a fully connected neural network with 2 and then they have been extended to supervised versions. This hidden layers. is the reason why they can be cast in hybrid models. NNs are only one of the several different tools introduced so far by the ML’s community. Since they are excellent in activation functions can be defined. Fig. 2 shows a general finding patterns, in the context of PLC, as we will discuss, they fully-connected architecture. can be used to find deterministic and probabilistic relation- Defined a metric δ and a cost function C, the easiest and ships between physical measured quantities such as the line most classical algorithm to find the feasible set of parameters impedence, the channel transfer function, the radiated power. θ is the gradient descent which iteratively updates θ as Channel emulation can be realized with neural networks, an θt = θt−1 − η∇C(θt−1) where η is the . Its early example of which is [17]. NNs can also be used for popular variants are Stochastic Gradient Descent (SGD) and prediction and network management purposes. adaptive learning rates (Adam) [16]. Common choices for the 3) Tools: Support Vector Machines: Another relevant ap- cost function, in a deterministic model, are the mean squared proach to supervised learning is the usage of a Support Vector error and categorical cross entropy, for which δ in (1) takes Machine (SVM) [18], [19]. We briefly describe the SVM 2 the form δ(y, ˆy) = ||y − ˆy|| and δ(y, ˆy) = −y · log ˆy, classifier since regression follows from the same idea with respectively. minor modifications. The SVM classifier predicts that a certain T In the context of probabilistic discriminative models, the sample xi belongs to a class yi if w xi − b is positive, where T fundamental learning criterion is the maximum likelihood w xi − b = 0 is the equation for the decision boundary. estimator which finds a value of θ in a parameterized family The aim is to maximize the distance d = 2/||w|| between T of models p(y|x; θ) that is the most likely to have generated the supporting rescaled hyperplanes w xi − b = 1 and

3 x 2 ||w||2 5) Application of Supervised Learning to Communications: Supervised learning algorithms have been applied in commu-

w·x - b = 1 nications scenarios with promising results with focus from physical layer to networking at the edge or at the cloud w·x - b = 0 segment.

w·x - b = -1 • Physical Layer: At the transmitter, the problem of max- imizing energy efficiency has been addressed via re- source allocation with deep neural networks [22]. At the receiver, several problems have been considered such as data detection and decoding [23], channel estimation and symbol detection in an OFDM system [24], linear x1 codes decoding [25], decoding for non-linear channels like those in optical communications [26], modulation Fig. 3: Decision boundaries and maximum separation for SVM recognition [27]. In addition, also radio-localization using framework. discriminant adaptive neural networks has been proposed in [28]; T • Link and Medium Access Control Layer: feedback on the w xi − b = −1 (see Fig 3). This leads to the following dual Lagrangian formulation for the linear SVM decodability of the received signal has been proposed using ML techniques in [29]; spectrum sensing and N N N X 1 X X allocation in cognitive networks in presence of interfering L = α − α α y y x x (6) d i 2 i j i j i j devices has been done in [30]; interference management i=1 i=1 j=1 was considered in [31] and estimation of the number of where αi are the Lagrange multipliers (non-negatives) and N active nodes in a wireless network testing various ML the number of samples. When data are non linearly separable techniques was the objective of [32]. in the domain space, transforming them into an higher di- • Network and Application Layers: At the edge, caching mensional one using the mapping x → φ(x) increases the popular contents in echo state networks exploiting a complexity of the classifier. A concept called kernel trick ML framework was investigated in [33]; at the cloud, a overcomes this problem; the key idea behind the kernel trick is network controller was designed for routing using a pre- that the inner product between xi and xj is similar to the inner dictive approach [34]; lastly, a survey of ML techniques product between φ(xi) and φ(xj). Moreover, since we are not for traffic classification was offered in [35]. interested in knowing φ but only the scalar product xi · xj, such product is represented by the kernel function K(xi, xj). B. Unsupervised Learning Consequently, we can make predictions using the function 1) Preliminaries and Definitions: Let xi ∼ p(x), i = X 1,...,N, be samples collected into a training set D belonging f(x) = b + αiK(x, xi), (7) to the probability distribution function (PDF) p(x). Unsuper- i vised learning aims to find useful properties of the structure which is a non-linear function w.r.t. x but linear w.r.t. φ(x) of a dataset D, ideally inferring the true unknown distribution and α. The most commonly used kernel is the radial basis p(x). Several different tasks are solved using unsupervised function (RBF) kernel with expression learning, roles such as: clustering, which divides the data  2  ||xi − xj|| into cluster of similar samples, feature extraction, which K(xi, xj) = exp − . (8) σ2 transforms data in a different latent space easier to handle As an example, in the context of PLC, SVM was recently and interpret, density estimation and generation/synthesis of explored to monitor and detect cable degradations [20]. new samples, whose objective is to learn, from data in D, the 4) Other Tools: Another very simple non-probabilistic su- distribution p(x) and to produce new unseen samples from it. pervised learning algorithm is k-nearest neighbors. It is a Contrarily to the supervised one, unsupervised learning does non-parametric algorithm in the sense that, since there is no not have an unified accepted formulation. This is even clearer learning process, at the test time we predict yi for a given if we think that many unsupervised learning tasks require the test input xi. The main idea is to find the k-nearest neighbors introduction or the presence of an hidden variable zi for each (for example using an Euclidean metric) to xi in the training sample, leading to the selection of different models under a set x. The algorithm returns the average, if regression is probabilistic approach (as shown in Fig. 1): considered, and the mode, if classification is considered, of • Discriminative models, where the latent code zi has to the corresponding y, also called training class. This algorithm be extracted from data xi by defining a family p(z|x; θ) is very easy to implement but it becomes significantly slower parameterized by a vector θ. as the number of samples increases. • Autoencoders, where xi is encoded into a latent variable Naive Bayes, decision trees, and [21] are zi so that recovering xi from zi is possible through other popular examples of supervised learning algorithms. a decoder. This model, as discussed in Sec. II-B3, is

4 very popular nowadays and can be interpreted also under Input Encoder Decoder Output a supervised learning approach, where xi is the input data, and xi is itself the corresponding target. In the classical deterministic approach, the problem consists σ(·) σ(·) of the parameterization of two functions, namely the σ(·) σ(·) encoder z = F (x; θ), and the decoder x = G(z; θ). In the probabilistic approach, the encoder has to model σ(·) σ(·) p(z|x; θ), while the decoder models p(x|z; θ). • Generative models, where there exists an hidden zi that σ(·) σ(·) generates the observation xi. After a specification of a parameterized family p(z|θ), the distribution of the obser- P x z x vation can be rewritten as p(x|θ) = z p(z|θ)p(x|z; θ). For a fully and exhaustive description of the different models Fig. 4: block with neural networks. A latent and the corresponding tools like Expectation Maximization representation z is learned by the encoder and used as input (EM) and Evidence Lower BOund (ELBO), the interested of the decoder for recovering x. reader is referred to [36]. We will focus only on clustering algorithms for discriminative models and autoencoders since they are now getting a lot of attention by communication which tries to learn a latent representation z, typically in researchers. a lower-dimensional space, of the input variable x, and a 2) Tools: Clustering: Clustering analysis includes different decoding block which reconstructs x at the output exploiting algorithms whose common task is to discover the hidden un- the code z (Fig. 4). known structure and relationships between input data. Several A typical learning formulation for deterministic autoen- algorithms are available in the literature for this purpose. The coders requires to solve most common and reliable ones are K-means, Hierarchical θopt = arg min δ(x,G(F (x; θ1); θ2)), (9) Clustering (HC), clustering using representatives (CURE), and θ Self Organizing Maps (SOM). where δ is a distance measure, θ = (θ , θ ), while F and G K-means is a clustering algorithm that allows to partition 1 2 stand for the encoder and decoder function, respectively. When the input data in K sets, based on a certain metric exploit- F and G are linear functions, we get the Principal Component ing expectation maximization algorithms. K-means clustering, Analysis (PCA). Given a set/matrix x of ND-dimensional considers a set of N data points in a D-dimensional space samples x ∈ D, PCA sets the encoder as F (x; θ) = WT x D and an integer K. The problem is to determine a set of i R R and the decoder as G(z; θ) = Wz where W is the unknown K points in D, called centroids, so as to minimize the mean R parameter, a D × M matrix where M is the dimension of the squared distance from each data point to its nearest centroid latent space. If δ is the quadratic loss function, then (9) can [37]. Usually, this clustering algorithm is solved efficiently be rewritten as with heuristic methods such as Lloyd’s. N HC algorithms are mainly classified into agglomerative X T 2 methods (bottom-up methods) and divisive methods (top-down Wopt = arg min ||xi − WW xi|| . (10) W methods), depending on how the hierarchical dataset is formed i=1 [38]. Multiple implementations and variations of the HC Let Σ be the sample covariance matrix of x, then W is given algorithms are described in the literature [39]. Essentially, they by the M principal eigenvectors of Σ. The geometric idea follow a procedure where a tree of clusters called dendrogram in PCA consists of finding a rotation of the domain space is generated. that aligns the principal axes of variance with the basis of The CURE algorithm attempts to uncover the cluster shape the new representation latent space associated with z. Under a using a collection of representative points from the main probabilistic model, the resulting transform is a vector z whose dataset [40]. This algorithm is efficient for large , elements are mutually uncorrelated. PCA is broadly used also and compared to K-means clustering is more robust to as a feature extractor, something which is very useful for the and able to identify clusters with complex shapes. analysis of both the channel response and the noise in PLC, SOM is a grid of map units [41]. Each unit is represented by for instance. a prototype vector that defines its space position. The units are Correlation is an indicator of linear dependence, but in most connected to the adjacent ones to form a network. The number cases we are interested in representations where features have of map units corresponds to the final number of clusters, thus, a different form of dependence. For example, in generative a higher number of units corresponds to a higher accuracy of networks, Dinh [42], [43] proposed a non-linear deterministic data separation. A training procedure is used to stretch the transform of the data which maps them into a latent space network and map the space of input data. of independent variables where the probability density results 3) Tools: Autoencoders: An autoencoder is a particular type tractable. Another interesting and suitable type of autoencoder of artificial neural network consisting of an encoding block for communications is the Denoising AutoEncoder (DAE)

5 [44]. The idea is to train a machine in order to minimize the Generator following denoising criterion: Noise Source Fake Data Discriminator z G x’ LDAE = Ex∼D[δ(x,G(F (˜x)))], (11) T/F where ˜x is a stochastic corruption of x. When we train a DAE D using the expected quadratic loss and a corruption noise ˜x = x x+ with  ∼ N (0, σI), Alain and Bengio [45] proved that the Real Data autoencoder recovers properties of the training density p(x). They also showed that the DAE with a small noise corruption Backpropagation of variance σ2 is similar to a Contractive AutoEncoder (CAE) Fig. 5: GAN framework in which the generator and discrimi- with penalty coefficient λ = σ2. The contractive autoencoder nator are learned during the training process. [46] is a particular form of regularized autoencoder which is trained in order to minimize the following reconstruction criterion: can be thought as a minimax two-player game which will end  ∂F (x) 2  when a Nash equilibrium point is reached. Given an input L = δ(x,G(F (x))) + λ , (12) CAE Ex∼D noise vector y with distribution p (y), the map into data ∂x F noise space is achieved through G(y; θ ). Defining the following where |A| is the Frobenius norm. The idea behind CAE gen F value function is that the regularization term attempts to make F (·) or

G(F (·)) as simple as possible, but at the same time the V (G, D) =Ex∼pdata(x)[log D(x)] reconstruction error must be small. Typically, autoencoders + Ey∼pnoise(y)[1 − log D(G(y))], (13) find a low-dimensional representation of the input vector x at some intermediate level. it was shown that the generator implicitly learns the true Now, in the domain of communications it has been proposed distribution since the equilibrium is reached when pgen = pdata. in [47] to learn a robust representation z of the input x in order This generation technique has several applications in com- to overcome channel perturbations (noise, fading, distortion, munication theory like stochastic channel modeling and chan- etc.). In this way the transmitted signal can be recovered nel coding [50]. As we will discuss in the following (Sec. with small probability of error exploiting the redundancy III-B), it has also application in PLC due to the ability learned by the autoencoder. The findings in [45] are somehow of generating synthetic data from a given distribution, thus, remarkable when applied in a communications framework modeling for example the PLC channel response or noise. because they assert that corrupting the transmitted signal with 5) Application to Communications: Herein, we review some form of noise can be beneficial for the autoencoder in some recent applications of unsupervised learning to commu- the reconstruction’s phase. nication systems for different layers of the protocol stack. 4) Tools: Generative Networks: Autoencoders can be stud- • Physical Layer: Autoencoders for were ied also as generative models. Based on variational inference, firstly interpreted as a communication system in [47]. Kingma and Welling introduced in [48] the concept of vari- The application of unsupervised learning to end-to-end ational autoencoders. Denoted z as the latent variable of the communications can be divided according to the presence observed value x for a parameter θ1, then p(z|x; θ1) represents or not of an a priori knowledge of the channel model the intractable true posterior which can be approximated by for the training part. If the channel model is available, a tractable one, q(z|x; φ), for a parameter φ. A probabilis- the usage of autoencoders with deep learning can help tic encoder produces q(z|x; φ) while a probabilistic decoder to design an optimal encoding/decoding procedure. For produces p(x|z; θ2). Rather than outputting the code z, the instance applications are found in optical fiber communi- encoder outputs parameters describing a distribution for each cations [51], in AWGN channels with unreliable feedback dimension of the latent space. When the prior is assumed to [52], sparse code multiple access schemes where deep be Gaussian, z will consist of mean and variance. Tuning the neural networks are used to learn a decoding strategy parameters in the latent space, and passing the new latent to minimize bit-error rate [53]. When Channel State samples through the decoder is a way to generate new data. Information (CSI) is available, channel charting [54] Finally, in the generative models framework, it is worth for localization exploiting CSI has been realized using mentioning Generative Adversarial Networks (GAN) proposed autoencoders. If the channel model is unknown, several by Goodfellow et. al. in [49]. The main idea is to train a new approaches have been recently proposed. In order to pair of networks in competition with each other (Fig. 5): a design a radio communication system with autoencoders generator model G that captures the data distribution and a and to overcome the lack of channel knowledge, a two- discriminator model D that distinguishes if a sample is a true phase training strategy was followed in [55]. For the same sample coming from real data rather than a fake one coming objective, [56] and [57] utilized stochastic perturbation from data generated by G. The training procedure for G is and policy gradients techniques, respectively. Generative to maximize the probability of D making a mistake. GANs models can be used to generate samples of a given

6 communication channel, in particular GANs have been where xt is a real-valued input vector available at time t, σ(·) exploited in [50], [58]. is the activation function, while WI,t, WH,t and bt are the • Link and Medium Access Control Layer: Spectrum sens- weight matrix connecting the input to the hidden layer, and ing exploiting GANs and resource management for LTE the weight matrix connecting the previous hidden layer to the were studied in [59] and [60], respectively. current one and the biases, respectively. At time t, the hidden • Network and Application Layers: Clustering algorithms layer ht is influenced by the current input xt but also by the are the fundamental tools to be used in the network and previous hidden state ht−1, such that the output yt depends application layers. Routing can be improved via cluster- on the evolution over time of the hidden layer. ing [61], while in the application layer, identification of Several different architectures for RNNs have been proposed clustering communities, which is a social networks goal, such as Long Short-Term Memory (LSTM) [70] and Bidi- has been considered in [62]. rectional RNNs [71]. For further details and a wide survey on RNNs, the interested reader is referred to [72] and [73], C. Other Learning Schemes respectively. In the following sections, we briefly discuss Convolutional 3) Reinforcement Learning: Reinforcement Learning (RL) Neural Networks (CNNs), Recurrent Neural Networks (RNNs) addresses the problem of an agent learning to act in a dynamic and the Reinforcement Learning (RL) approach to offer an ex- environment by finding the best sequence of actions that haustive overview of other ML tools that may have application maximizes a reward function. The basic idea is that the in PLC. agent explores the interactive environment. According to the 1) Convolutional Neural Networks: Convolutional Neural observation experience it gets, it changes his actions in order Networks (CNNs) (Fig. 6.1) are able to capture the spatial and to receive higher rewards. RL finds several applications, from temporal dependencies of data, and for this reason they find robots control [74], to games such as Atari [75], and Go [76]. application in image and document recognition [63], medical Basic RL can be modeled as a Markov Decision Process image analysis [64], natural language processing [65], and (MDP). Let St be the observation (or state) provided to the more in general pattern recognition. CNNs are Multi Layer agent at time t. The agent reacts by selecting an action At (MLP) (Sec. II-A2) with a regularization approach to obtain from the environment the updated reward Rt+1, the since they consist of multiple convolutional layers to ensure discount γt+1, and the next state St+1. In particular, the agent- the translation invariance characteristics. In particular, given environment interaction is formalized by a tuple hS, A, T, r, γi an input data matrix Ii, the feature map Fj is obtained as where S is a finite set of states, A is a finite set of actions, 0 0  C  T (s, a, s ) = P [St+1 = s |St = s, At = a] is the transition X 0 Fj = σ Ii ∗ Ki,j + Bj (14) probability from state s to state s under the action a, r(s, a) = i=1 E[Rt+1|St = s, At = a] is the reward function, and γ ∈ [0, 1] namely, through the superposition of C layers, e.g., C = 3 is a discount factor. To find out which actions are good, the for RGB images, each comprising a convolution between the agent builds a policy, i.e, a map π : S ×A → [0, 1] that defines the probability of taking an action a when the state is s. If we input matrix Ii and a kernel matrix Ki,j, plus an additive   bias term B , and a final application of non-linear activation P∞ Qk j denote with Gt = k=0 i=1 γt+i Rt+k+1 the discount function σ(·), typically, a sigmoid, or tanh, or ReLU. Each set return, then the goal of the agent is to maximize the expected of kernel matrices represents a filter that extracts local features. discount return (value) qπ(s, a) = [G |S = s, A = a], by To control the problem of over fitting, the dimension of data Eπ t t t finding a good policy π(s, a). When the state or action sets are and features to be extracted is reduced by pooling layers. defined in high-dimensional spaces, the policy π and the value Finally, fully-connected layers are used to extract semantic q can be represented by deep neural networks (Sec. II-A2). information from features. With some more detail, there are essentially three different 2) Recurrent Neural Networks: Recurrent Neural Networks RL algorithms [77], [78]: a) policy based methods, when the (RNNs) are a class of supervised and unsupervised ML tools agent, given the observation as input, optimizes the policy π that are capable of representing time dependencies. Since without using a value function q; b) value based methods, they introduce the notion of time into the model, they have when the agent, given the observation and the action as inputs, been successfully applied to tasks such as speech [66] and learns a value function q; c) actor critic methods, where a critic handwriting recognition [67], health care [68], and machine measures how good the action taken is (value-based), and an translation [69]. Bearing in mind the architecture of the NNs actor controls the behaviour of the agent (policy-based) (Fig. presented in Sec. II-A2, RNNs can be seen as feedforward 6.3). NNs made of artificial neurons with one or more feedback 4) Application to Communications: In [79], CNNs naively loops (Fig. 6.2). extract features and use them for radio modulation recognition If we consider a simple RNN made of the input layer, an on time series of radio signals. A similar approach is adopted hidden layer, and the output layer, then the hidden layer or in [80] where CNNs automatically detect and identify fre- memory of the system h , can be obtained as t quency domain signatures for the IEEE 802.x standard compli- ht = σ(WI,t · xt + WH,t · ht−1 + bt) (15) ant technologies. RNNs have been firstly introduced as channel

7 Convolutional Neural Networks as in-home [85]–[88], outdoor-access [89], [90] and in-vehicle (cars, ships, planes) [91]–[95]. A great deal of knowledge has Feature maps Pooled feature maps Fully connected been acquired (mostly for the broad-band channel in the in- home scenario) and some consensus has been reached on the

...... 1) ...... main characteristics. For instance, the frequency response and the average channel gain have a log-normal distribution [96], [97]. Periodic channel time variations are exhibited especially Input Layer Convolutional Layer Pooling Output in the narrow band spectrum, or for frequencies up to 10 MHz [98]. Correlation is exhibited among the multi-conductor Recurrent Neural Networks channels in MIMO setups [99]. Input Layer Hidden Layer Output Layer Noise has a very complex nature in PLC since it comprises W H,t not only thermal noise components but also (and mostly) com- σ(·) ponents generated by active loads and electronic circuits via 2) direct coupling and conduction, e.g., harmonics and emissions σ(·) from power converters, or inductive/EM coupling especially

σ(·) at high frequencies [100]. Noise is extremely heterogeneous WI,t and temporal dependent. As such, its modeling is challeng- ht yt xt ing. The existing description relies on a physical phenomena Reinforcement Learning classification into stationary, cyclostationary, impulsive, and

Agent Environment interference from radio transmissions that couple in the lines Error [101]. Only initial results for the description of the multi-

Comp. conductor noise have been reported [102]–[105]. 3) Value Reward As in all communication systems, the accurate knowledge of the medium is a prerequisite for the design of a reliable Critic Actions and energy efficient communication technology. ML can bring Actor State observations new insights on the medium characterization. Firstly, it can be used to analyze and extract features. Then, to classify Fig. 6: Graphical learning models of: 1) Convolutional Neural data from measurements. The definition of the features that Network with one convolutional layer; 2) Recurrent Neural contain the information necessary to separate the data into Network with simple feedback loop; 3) Reinforcement learn- classes is not a trivial step in the noise classification process. ing setting for actor-critic method. Aiming at obtaining a statistical understanding of the medium, the features include energy, moments of multiple order, ap- proximated entropy and correlation. A comprehensive list of equalizers in [81], while recently in [82], LSTM RNNs have features that can be considered, is reported in Tab. I. The been applied for network traffic . Dynamic knowledge about the PLC physical properties aids a lot to spectrum assignment in PLC has been investigated in [83] identify the most relevant features. For instance, the entropy under a RL approach. Due to the dynamic environment, RL is useful to recognize narrow band noise signals that are finds several applications in communications and networking, characterized by a sinusoidal behavior. The a priori knowledge mostly covered in [84]. that electromagnetic coupling effects among conductors may exist, suggests to study the correlation between noise traces in III.ML FOR PLCMEDIUM CHARACTERIZATION AND multiple conductor networks. MODELING Classification tools can then be applied to determine the In the following sections. We will mostly highlight the most prominent features and the significant number of classes. relevance and applications of the supervised and unsupervised An example is PCA (Sec. II-B3) analysis which is useful ML tools described in Sec. II-A and Sec. II-B in this context. to reduce the dimensions of large datasets of variables and Reinforcement learning will also be briefly discussed in the determine the most important features in the data. context of resource allocation in PLC. Now, let us consider multi-conductor PLC noise. The ex- We start from medium characterization since this is a key ploitation of unsupervised learning allows the identification topic in PLC. It includes the analysis of the channel response of noise classes. For this task, in [109], a library of features (in time and frequency domain), the line impedance, the noise was created for a dataset of multi-conductor narrow band and interference [7]. (NB) noise traces obtained via measurements in the band 3- 500 KHz in an office environment. Different methods were A. Statistical Learning tested to extract the features (listed in Tab. I), and then Up to date studies, have focused on the statistical charac- provide a classification of the noise. Features such as the terization of the PLC channel and noise. For what matters the distance correlation (dCor), the Person correlation (pears), or channel, datasets have been collected in several scenarios, such the differential entropy (diffEnt) are useful to highlight the

8 TABLE I: Features description.

ID Feature Equation Description name

1 maxAbs maxj {| sj |} maximum of the absolute values PN 2 sum j=1 sj sum of the samples PN 2 3 sum2 j=1 sj sum of the squared samples q 1 PN 2 4 std N−1 j=1 | sj − µi | standard deviation of the samples 1 PN (s −µ )3 N j=1 j i 5 skew r 2 skewness of the samples 1 PN (s −µ )2 N j=1 i i 1 PN (s −µ )4 N j=1 j i 6 kurt r 2 kurtosis of the samples 1 PN (s −µ )2 N j=1 i i PN j=1(sCH1,i−µCH1)(sCH2,i−µCH2) 7 pears rCH1,2 = r Pearson correlation of the samples PN (s −µ )2 PN (s −µ )2 j=1 CH1,i CH1 j=1 CH2,i CH2 q PN 2 8 dist ksCH1 − sCH2k = j=1 |sj | distance between data of channel 1 and 2 9 dCor [106] distance correlation

10 ent [107] entropy approximation 11 diffEnt [107] entropy approximation of CH1-CH2

12 sumEnt [107] entropy approximation of CH1+CH2

13 fPeak — find the number of samples over a certain voltage level 14 fEnFr Exploiting the power spectral density (PSD) estimated via Burg’s find the energy of samples in a certain frequency range method. [108]

15 fdist — find the distance in samples between the two highest peaks

16 corrStd — standard deviation of the correlation of two channels 17 corrSkew — skewness of the correlation of two channels

18 corrKurt — kurtosis of the correlation of two channels

th In this table i and j are respectively the index of the slot and the sample, µi is the samples mean of the i slot, s stands for the sample and N is the length of the slot (in number of samples).

relations between the noise traces. SOM [41] provided the best 0.8 results among the tested algorithms. Finally, for each class a Generalized extreme value statistical analysis of the associated traces was carried out. An Extreme value 0.6 Normal example of results obtained with this classification approach Alpha stable is shown in Fig. 7. Herein, the energy of the noise voltage trace on the first pair of conductors (neutral-ground, namely 0.4 VCH1(t)) is related to the energy of the noise trace on the

second pair of conductors (neutral-phase, namely VCH2(t)). Energy of trace 2 0.2 The identified cluster PDFs are also reported in the legend of Fig. 7. 0 This data driven approach can classify and discover features 0 0.2 0.4 0.6 0.8 in a more general way than the conventional approach that Energy of trace 1 starts from a physical description of the phenomena generating Fig. 7: Example of noise data analyzed with SOM. The clusters noise, and essentially it partitions the noise into stationary, are labeled with the associated PDF. The energy of the trace cylostationary synchronous/asynchronous with the mains fre- pairs 1 and 2 are normalized w.r.t. to their maximum. quency, and impulsive [110]. Another example is the analysis of the line impedance and the verification of whether it has a relation with the channel response [88], [111], [112]. This is a relevant question since analysis of the lines and loads using Transmission Line (TL) should this be the case, then the forward link (transmitter- and circuit theory. Some results following this approach have receiver) channel can be inferred from the measurement of been recently obtained and proved that such a dependence the line impedance at the transmitter node [112]. In principle, does not exist if not at very low frequencies and for particular the answer to this question could be obtained with a physical circumstances where the cables and wiring structures have

9 conductors that do not have per-unit-length cross impedance 0 terms [112]. However, one can object that the proof has been obtained under the assumption of transverse electromagnetic -20 propagation for which TL holds true. Therefore, in general a non-linear, complex and time dependent relation may exist. Here is where ML comes into place: to discover dependencies. -40 Another example is the analysis of the dependence of the line impedance at the transmitter and at the receiver. [dB] |H| -60 Should such a dependence exist, then physical layer security techniques could be applied to exploit such common informa- -80 tion for the generation of secure keys without the (insecure) Measured Generated exchange of information between sender and recipient [113]. -100 0 20 40 60 80 100 B. Statistical-Synthetic Modeling f [MHz] Once the medium is characterized, next, modeling kicks Fig. 8: Broad-Band PLC channels measured and generated in. A number of relevant results have been obtained for with GAN. modeling the linear periodically time variant channel response [87]. They include so called bottom-up approaches and top- down approaches. In the former case, TL theory is applied look very consistent with the measured ones and span the large to a deterministic or statistical description of the network dynamic range of attenuation from 10 to 90 dB. topology which includes the network graph, cable types, cable A similar approach can be adopted on other domains, for lengths, and loads [114]. In the latter case, a mixed physical example, GANs along with RNNs (Sec. II-C) are promising and phenomenological description of the channel response is tools to generate PLC synthetic noise while taking into account given with the use of a parametric model derived from an time dependence. In addition, another area where ML can abstract description of the link which accounts for multipath provide help is modeling electro-magnetic-interference and propagation, transmission-reflection effects, cable attenuation mode conversions occurring along the medium on the signal with length and frequency [115], [116]. [118]. Radiation in PLC occurs at the network discontinuities Deterministic or stochastic fitting of the model with data and its radiation pattern depends on the length of each branch. from measurements was done to derive either a deterministic The electromagnetic radiation of the cables can be accounted model or a random channel model. More recently, it was for in a per-unit-length equivalent circuit model using a series observed that if we had an accurate statistical description of of resistances [119]. An unsupervised optimization approach the channel response, then a purely phenomenological model (using the techniques in Sec. II-B) can be followed to obtain could be derived [117]. Essentially, such a model would the optimal parameters. Alternatively, starting from an initial be able to generate synthetic data that follow the observed set of known parameters obtained for a given measurement statistics, totally abstracting from the physical interpretation of dataset, supervised learning (using the techniques in Sec. II-A) propagation of EM signals along the wires. This is essentially provides a methodology for the generalization and realization what ML approaches are nowadays trying to do when used of a statistical channel generator that accounts for radiation as generators of fake data in image and sound applications. effects. Indeed the question is how to statistically analyze data and, in a more challenging way, how to generate synthetically IV. ML FOR PHYSICAL LAYER PLC such distributions. The work in [117] proposed an approach which, with now an eye on ML techniques, can be further and PLC physical layer design [8] encompasses the tasks of de- significantly improved. For instance, GANs are an innovative signing the channel coder, the modulator and the associated al- tool to generate synthetic data following the same statistics of gorithms at the receiver side, which includes synchronization, a given dataset. In particular, for communications, stochastic channel estimation, detection/equalization, interference/noise channel synthesis is a relevant topic since, for instance, new mitigation, channel decoding. In addition, precoding at the coding and modulation schemes require realistic channels to transmitter side in the form of power allocation, bit allocation, be effectively tested. Sec. II-B4 describes the full methodology precoding, and interference alignment are also tasks classically to design and train generative networks for this purpose. As relegated to the physical layer [120]–[122]. an example, we report in Fig. 8 the result of using a GAN Physical layer PLC design was a vivid activity which for the generation of the broad band PLC channel frequency brought innovation (partly derived from the results firstly response. The neural network architecture is trained with a obtained by the wireless communications community) in the dataset of 1000 measured single-input-single-output channel form of: transfer functions with bandwidth of 100 MHz obtained from • Analysis and optimization of concatenated codes, bit- measurements in the in-home environment [97]. Fig. 8 shows interleaved codes, LDPC, Turbo codes, Fountain codes that the synthetically generated channel frequency responses and associated decoding strategies taking into account

10 the assumed peculiar amplitude statistics of noise, e.g., Middletone distributed [8, Sec. 5.8]. TX CHANNEL RX • Development of single carrier modulation (coded FSK, CDMA in first generation PLC devices), UWB modu- lation, multicarrier modulation (OFDM and filter bank s x y s modulation in second generation PLC devices) [8, arg max Softmax Dense NN Sec. 5.2-5.5]. Dense NN Embedding Noise layer PLC medium • Precoding design taking into account the periodically Normalization time variant behavior of the medium so that the signal-to- noise ratio exhibits a periodic behavior over mains cycles [9, Sec. 6], [120]. • MIMO techniques (also adopted by standards) and se- Fig. 9: Representation of a PLC system as an autoencoder. lection combining schemes including hybrid approaches that merge PLC links with radio links to improve diversity [121], [123]. robust representation of the input signal which can lead to Although most of the above mentioned aspects are common the identification of optimal coding and decoding schemes for to any communication system, the PLC channel specificities stochastic channels. The autoencoder may find a solution that introduce several challenges, e.g., transmission is more band outperforms existing modulation and coding methods. In fact, limited than in other systems, up-to-now a PSD constraint the autoencoder approach works without any assumption on rather than a total power constraint has been imposed at the the channel and is able to recognize, in principle, its dynamics transmitter, spectrum management is relevant for coexistence, without the need to rely on the current view of the PLC prediction of channel and noise is also relevant and may be channel as a linear periodically time-variant system [87]. enabled with new understanding of the medium. The question An autoencoder for application in PLC, schematically de- is whether ML can bring new lymph in PLC PHY layer design. scribed in Fig. 9, has been implemented using TensorFlow We envision a number of ways forward as detailed below. libraries [124]. The transmitter and receiver blocks were Focusing at the receiver and at its tasks, ML can be modeled as a cascade of multiple fully connected layers. The used for estimation problems such as channel estimation and channel block was designed with a first layer that embeds synchronization. For instance, assuming a certain parametric the channel response (medium) and a second additive layer of model of the channel, the parameters can be estimated via Gaussian noise. An arbitrary channel response for the simula- learning and inference. Similarly training of the equalizer can tion was taken from the dataset described in [97] consisting of also be done via learning from field data. More in general, the several channel realizations with bandwidth of 100 MHz. The receiver task, i.e. data detection, can be seen as a classification autoencoder takes as input symbols with cardinality 4 and 16 problem. It can be solved with supervised learning, namely (2 or 4 bits), respectively referred to as 4-autoencoder, and 16- using NNs (Sec. II-A2) or SVMs (Sec. II-A3), where the autoencoder. The performance of the autoenconder (in terms received signal is the input and the known transmitted bits of symbol error rate) has been compared to that attained by are the output during the training process. an impulsive 4-PAM or 16-PAM modulation scheme with an If we now look at the full communication system, a fasci- ideal matched filter receiver [125]. The results are shown in nating idea is to treat it as an autoencoder [47] (Sec. II-B3). Fig. 10 as a function of the Signal-to-Noise Ratio (SNR) per Let s ∈ M = {1, 2,...,M} be a message out of M possible bit. ones and let F : M 7→ Rn be the transform realized by It is interesting to notice that the 4-autoencoder performs 1 the encoding stage that maps the input message s into the dB worse than the 4-PAM scheme in the considered SNR range transmitted signal x = F (s). This first block can be consid- which is also due to the fact that the 4-PAM scheme assumes ered as the transmitter block which generates the waveform perfect knowledge of the channel. However, by increasing the x = F (s). After it, the middle block of the autoencoder, learns input signal cardinality the autoencoder manifests very good the transform made by the channel and stochastically described performance exceeding that of 16-PAM for all the considered by the conditional probability density function p(y|x) where SNRs. Furthermore, at a given SNR level we note an autoen- y corresponds to the received signal. Lastly, the decoding coder cliff effect, i.e., a significant increase of the SER curve block (receiver) takes y as input and estimates the transmitted slope. This can be justified if we think that the autoencoder message s by computing the transform G : Rn 7→ M. Each of finds a more suitable non-linear coding and decoding scheme these blocks can be implemented using NNs so that their chain when it has a larger hyperspace to search in. constitutes a deep neural network which, trained end-to-end, Another bunch of applications of ML for PLC is in the reconstructs the input message at the output. Since the input is domain of optimization problems that arise at the transmitter the message s and the desired output is the same reconstructed side, namely, PHY layer resource allocation. Let us think for message s, this leads to a classification task where the cross- instance to the water filling problem that aims at distributing entropy is a natural choice for the cost function. The end- optimally power across the available spectrum [9]. Instead of to-end learning process enables the autoencoder to find a pursuing a physical model based approach, we may want to

11 100 to learn how to exploit the wide flexibility of parameters at both the PHY and MAC layers. Things get even more complex when cooperation is required, i.e., when relaying and routing has to be implemented to offer full coverage in large massive networks [126], [127]. In brief, cross layer optimization, resource allocation, spectrum management and 10-1 scheduling etc. are challenging tasks in PLC networks, as the results coming from field trials and large deployments are telling us. The dynamics of the channel and traffic, the abrupt

Symbol error rate 4-PAM network shut-downs and the need for fast reconfiguration, 4-Autoencoder render these tasks extremely demanding if a layer by layer 16-PAM approach is considered, as traditionally done. This is because 16-Autoencoder the traditional approach requires modeling each layer and 10-2 -4 -2 0 2 4 6 8 function, which is prohibitively complex. Here is where ML E /N [dB] comes into place. b 0 ML and more in general artificial intelligence is envisioned Fig. 10: SER versus Eb/N0 for the autoencoder of Fig. 9 and as a relevant tool to develop a more abstract and synthetic the M-PAM modulation scheme. model of the network behavior and to realize the network orchestration tasks [128]. The network behavior has to be learned via the acquisition of real data sets where events, ac- train a NN to provide the power allocation as the result of tions and results are labeled to create time series that can train observing data from the field and defining a cost function optimization algorithms. In essence the orchestration of the which can be the bit-error rate or throughput. It should be network can be seen as a control problem where data observers noted that with a bottom-up approach this task can be solved serve as sensors, a global cost function is then formulated by identifying the system state (channel response and noise), and minimized via learning from field data. To the best of perform classification and estimation, and then dynamically our knowledge, efforts in applying ML techniques in routing determine the optimal allocation for the specific identified problems have been carried out mostly for wireless networks system state. However, a physical agnostic neural network can and specifically Mobile Ad-hoc Networks (MANETs), where embed in one shot all these tasks and efficiently provide the topology information is not available to nodes and the data- solution. Of course, how to train the network and how complex driven nature of the application poses significant challenges in the network has to be is a research question especially in PLC terms of adaptation to run-time changes and scalability [129]– where the system state can have abrupt changes due to network [132]. Most studies also show that classical solutions, such as topology changes, load changes, and noise generators changes. Bellman-Ford, Ad-hoc On-demand Distance Vector (AODV), In addition, another approach based on reinforcement (Sec. Destination-Sequenced Distance-Vector (DSDV), Low-Energy II-C) learning can be exploited as done in [83]. Adaptive Clustering Hierarchy (LEACH), shortest path and the such, perform poorer than ML approaches when it comes V. ML FOR MAC AND NETWORK LAYER PLC to achievable throughput, end-to-end delay and time required Media access control and networking can also enjoy the to converge to a better solution when run-time changes occur. use of ML. State-of-the-art PLC systems use contentious based The most widespread approach to tackle these problems is media access techniques, i.e., CSMA, Aloha, flooding, as well reinforcement learning [133], where cost functions are de- as adaptive TDMA and FDMA [9]. A universally optimal fined in order to switch the behavior of a router based on solution has not yet been found especially because of the environmental and link conditions (Sec. II-C). This is done heterogeneity of applications, services, traffic requirements, by implementing a rewarding (or penalizing) system, which variability of medium conditions and network configurations. influences the probability of an element to make a specific Let us think for instance of a typical automatic metering choice, that keeps track of past states of the network and uses application in the access network managed by the concentrator them to predict the possible future changes in the traffic load. at the MV-LV sub-station, in contrast to a PLC in-building The main disadvantage of RL is its computational demand, network managed by cell coordinators, or a street light control which weighs on every node of the network when implemented application, in contrast to a PLC backhauling solution for in a distributed kind of architecture. In the following, we 5G wireless micro-cells. Although chip sets and transceivers present a different, computationally lighter approach, where (both NB and BB) have been developed to serve as enabling supervised learning is used to synthetically train neural net- communication devices for the plethora of above applications, works to implement a decentralized smart routing solution that and although they can provide point-to-point connectivity, the is able to react to changes in PLC networks by using readily optimization of their performance in a massive networked available information. environment is still a challenge. This is because multiple layers In more detail, the envisioned application aims at using the need to be optimized and so far limited data was available power line distribution network as a fronthaul infrastructure

12 Synthetically Number of nodes trained arti�icial in the network neural network Average topology descriptors σ(·) σ(·) Routing path Transmitter ID σ(·) σ(·) Hops number Destination ID σ(·) σ(·) ...... Maximum distance σ(·) σ(·)

Minimum capacity

Fig. 12: NN used for determining routing.

by a self-discovery mechanism. With every node being able Fig. 11: Inferred average capacity of an end-to-end PLC link to infer quickly and easily the capacity of the links in the based on easily-observable topology parameters. network, a simple ML-based routing algorithm can easily increase capacity, as shown in Fig. 13, where the capacity gained with ML routing is compared to the optimal solution, for a small cell radio network. In [134] the small cell radio the latter achieved with full knowledge of topology and an network and the underlying power line infrastructure are exhaustive solution of the optimization problem. brought together in a joint paradigm and the capacity of the On the other hand, we can also implement routing by PLC fronthaul is analyzed through a bottom-up emulation tool considering the number of hops as the metric to be minimized, [114], [135]. A regression approach is developed to tackle the while imposing a constraint on the minimum capacity to be problem of determining the capacity of an end-to-end power achieved. In addition, PLC can be used as an access network line link based solely on geometrical/topological properties of infrastructure for narrowband applications [136]. Sensor net- the network, for instance the density of nodes (radio cells) works, which usually communicate via wireless technologies, in the overall service area, and the distance between the can be supported by a PLC infrastructure where this solution communicating nodes. has better penetration than a wireless one. A popular protocol With the approach followed, it is possible to infer depen- for routing in these networks is the Lightweight On-demand dencies of the channel capacity and routing performance on a Ad-hoc Distance-vector (LOADng) protocol developed by number of parameters (Fig. 11), e.g., topology type, density G3-PLC [136]. This protocol relies on information tables of nodes (branches), cable type, loads, electrical distance, etc. in order to connect all nodes in the network; topology is The channel model used to generate synthetic data is a TL discovered by broadcasting route request packets, and routes model where the topology, loads, cable electrical characteris- are determined by minimizing a cost function. Asymmetric tics etc. are included. It generates frequency selective channel links, burst traffic and noise, as well as nodes with limited responses for any pair of nodes in the network. Distance resources on power, computation and memory offer constraints here refers to the length of the power line backbone that that can be solved with lightweight neural networks. As shown connects two considered nodes. In detail, Fig. 11 shows the in Fig. 12, we can train our learning machine to devise the capacity inferred through regression of two nodes pertaining optimal number of routers and the best path for a message to to a PLC network when the communication is implemented reach any point in the network, both in downlink and uplink. in the broad-band spectrum (2-86 MHz). The curve shows The training set is herein generated synthetically by means an aberration for low densities and high distances: capacity of the aforementioned emulation tool, thus a big number of does not actually increase with distance for low densities. This random network samples can be used to let the NN learn the is due to the fact that regression is implemented through a underlying relations between topology and the optimal routing simulator that first deploys loads (small cells) on the territory strategy. Specifically, the NN scheme accepts as inputs the and then tries to connect them in a way that power line is source-destination IDs, the number of nodes in the network, minimized. This means that for low densities, high distances the maximum distance between these nodes, the minimum are never achieved, thus the regression algorithm does not have capacity required to consider a link as functioning, and some data for that specific density-distance subset. other geometrical descriptors. The scheme outputs the number The capacity inferred through regression enables a ML of routers required to achieve connectivity and the best path approach that can be used by nodes of the network to infer for connection. The basic idea of this implementation is that the best position of a global router to extend coverage and the nodes do not need to know the topology of the network, maximize capacity, given that the topology is known at every but only the few parameters referenced in Fig. 12. Fig. 14 point. This information can be set by the manager of the reports the percentage of correct guessed routing paths, i.e., power line infrastructure, otherwise, whenever a new node the ratio of routing paths obtained by ML that are equal to the is added to the network, the topology can be reconstructed ones obtained with the exhaustive solution of the optimization

13 200 Exhaustive solution 1 ML inferred 0.9 150 0.8 Small Training Set 100 0.7 Large Training Set

Capacity gain 0.6 50

Correct ML routing probability 0.5 75 100 125 150 175 200 0 Number of nodes 0 0.2 0.4 0.6 0.8 1 Normalized density Fig. 14: Probability of determining the optimal routing scheme Fig. 13: Capacity gain using ML routing in a PLC backhauling via NN as a function of the number of nodes and training set cellular network as a function of the radio cells density in the dimension in a metering PLC network. area.

grid status, malfunctions etc. which in turn enables predictive problem and the perfect knowledge of the link capacity. The maintenance of the grid and therefore grants a better energy results show a high probability of success. Furthermore, it delivery service. Some recent results have provided a better is worth noticing that the NN was trained with network understanding of the propagation of high frequency signals topologies with a number of nodes ranging from 100 to 175. (as PLC ones) in multi-conductor transmission lines [141]. To prevent overfitting, an initial dropout layer [137] (drop Essentially, it has been shown that we can relate the state of rate p = 0.5) was also added. The set used for testing the the grid, cables and loads to several electrical quantities that performance of the NN is nevertheless larger, i.e., networks can be measured and that provide information to detect the with less than 100 or more than 175 nodes are simulated in presence of anomalies. The results offer an analytic framework order to assess the capabilities of our approach to find routing described by a set of complex but extremely informative solutions for unobserved situations during the training stage. functions. Nonetheless, diagnosis of malfunctions ends up into This explains the somewhat lower performance of ML routing an estimation problem or classification problem [140]. In this w.r.t. the optimal routing for a number of nodes smaller than context, [142] has already provided results on the usage of 100 and larger than 175. ML for cable degradation detection, for instance. In conclusion, based on the results presented, ML for net- To provide a numerical result, we consider the problem of working in PLC is promising. Further investigations should be detecting anomalies in a distribution grid by the analysis of done: representative datasets have to be generated, supervised high frequency PLC signals. In [141] it was shown that for this and unsupervised ML techniques have to be compared, and objective, the line admittance at the nodes (Yin), the reflection appropriate training algorithms have to be developed. coefficient (ρin), and the transfer function (H) between pairs of nodes can be exploited. Supervised learning (Sec. II-A) VI.OTHER APPLICATIONS can be very powerful here; once several sampled signals x PLC is not a mere communication technology. The power are stored with the respective labels y, NNs or SVMs can grid is a fully interconnected system where all electrical phe- learn the deterministic relationship y = F (x) (which can nomena affect the entire network. Moreover, it is influenced be arbitrarily complex) and predict new unknown labels for by external physical events. Hence, PLC nodes can be used to different sensed signals. Here, we consider a simple neural sense the grid itself as well as the surrounding environment. network approach. A dataset was realized based on a single In other words, PLC transceivers can act as probes for grid topology realization with 20 nodes and average node length diagnostics by analyzing the electromagnetic field and the data of 700 m. Anomalies occurring on the grid have been added traffic in the PLC frequency bands [11]. Possible applications to obtain 10000 realizations of a perturbed grid, specifically, of PLC for grid sensing are topology reconstruction [138], load impedances changes, concentrated faults and distributed [139] and anomalies detection, i.e., fault detection, cable faults according to the models described in [141]. aging, load identification [140]–[142]. The former application The training process was conducted using 50% of the data, enables grid operators to better the knowledge about the grid while testing was done using the other unbiased half part. We configuration, the status of switches and feeders which is used only one hidden layer with 100 neurons and a softmax not always complete especially in the LV part of the access layer to work with probabilities as output. Input signals are the network. The knowledge of the network topology is also ratios between the measurement of the electrical parameters important for networking aspects since it allows to locate Yin, ρin and H at a given time, and the reference value of nodes, and implement for instance geo-routing algorithms. The the respective signal in the band 4.3 − 500 kHz with 4.3 kHz latter application enables grid operators to better monitor the frequency. The reference values were obtained when

14 Classes Accuracy 1217 64 14 30 91.8% 1 Fault Detection All 4 Classes 3 Classes 24.3% 1.3% 0.3% 0.6% 8.2%

Constant 87.8% 87.8% 100 %

Load 0 1123 10 275 79.8%

Imped. Variable 98.7% 89.2% 87.1% 2 0.0% 22.5% 0.2% 5.5% 20.2% TABLE II: Anomaly detection accuracy with constant and variable load impedances. 0 53 1229 58 91.7% 3 0.0% 1.1% 24.6% 1.2% 8.3% there was no anomaly. The output signals from the classifier class Predicted 0 38 0 889 95.9% are 4 different class indicators according to the particular 4 type of fault: 1. unperturbed, 2. load impedance change, 3. 0.0% 0.8% 0.0% 17.8% 4.1% concentrated fault, 4. distributed fault. The obtained results are similar for all three types of electrical signals considered, 100% 87.9% 98.1% 71.0% 89.2% so we present here only the accuracy values of the classifier 0.0% 12.1% 1.9% 29.0% 10.8% when using the reflection coefficient. Furthermore, both the 2 Ω 1 2 3 4 case of load impedances that stay constant (at k ) or that Real class are randomly variable are considered. 4 The first column of Tab. II reports the ability to detect the Fig. 15: Confusion matrix (accuracy of the prediction) for - presence of a fault and this is achieved through an accuracy classes anomaly detection using the reflection coefficient as of 87.8% when the load impedances are constant, and 98.7% the main input signal. when they change, respectively. The second column shows the ability to detect and classify the type of anomaly: the average accuracy is 87.8% when the load impedances are constant, and in PLC. Despite its infancy, in this application domain, ML 89.2% when they change, respectively. It is rather interesting stimulates an all new era of research activities in PLC. Several to notice that, when the load impedances are constant, the future research directions have been envisioned, among which neural network is not able to fully distinguish between the the exploitation of domain knowledge of PLC to better extract unperturbed network (class 1) and the distributed fault event features via ML tools. For instance, the a priori physical (class 4). This suggests to remove from the data the last type knowledge of the channel (through parametric models) offers of anomaly, and study the ability of the neural network to a guidance to better learn and characterize it. Furthermore, classify the remaining first 3 classes. This is presented in the how to train, validate and test the machine (which is a core third column, where it is shown that the accuracy reaches topic in ML), and how to compare the ML tools opens the 100% when the load impedances are constant, and 87.1% door to several research endeavors. The starting point in PLC when they change, which means that most of the uncertainty will be the development of representative data sets. This has is introduced by the last class. More details are offered by Fig. to be followed by a statistical learning approach where the 15, which reports the confusion matrix of 4 classes anomaly validity of results is measured in a probability framework. detection with variable loads. In this case, the confusion matrix shows a conflict between class 2 and class 4, in particular the REFERENCES NN erroneously tends to predict (around 1/3 of the times) class 2 when the correct class is instead the 4th. The results [1] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 7 1948. are promising and encourage further work for tuning the NN [Online]. Available: https://ieeexplore.ieee.org/document/6773024/ hyper-parameters. [2] I. Kant, Critique of Pure Reason, ser. The Cambridge Edition of the Works of Immanuel Kant. New York, NY: Cambridge University VII.CONCLUSIONS Press, 1998, translated by Paul Guyer and Allen W. Wood. [3] C. M. Bishop, Pattern Recognition and Machine Learning (Information This paper has provided an overview of ML as a signal/ and Statistics). Berlin, Heidelberg: Springer-Verlag, 2006. processing tool to improve the knowledge and the performance [4] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 5 2015. of PLC systems. A clear distinction among deterministic and [5] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The probabilistic ML formulations has been made. Several new MIT Press, 2012. ideas of application of ML in PLC have been presented in [6] L. Lampe, A. M. Tonello, and T. G. Swart, Eds., Power Line Commu- the domain of medium characterization, statistical modeling, nications: Principles, Standards and Applications from Multimedia to Smart Grid. Wiley, 2016. PHY layer, MAC layer and grid diagnostics. The paper has [7] F. J. Canete et al., Channel Characterization. John Wiley discussed what ML supervised and unsupervised tools can & Sons, Ltd, 2016, ch. 2, pp. 8–177. [Online]. Available: be used for the problem at hand. The concepts have been https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118676684.ch2 [8] K. Dostert et al., Digital Transmission Techniques. John Wiley supported by numerical results using synthetic or real datasets. & Sons, Ltd, 2016, ch. 5, pp. 261–385. [Online]. Available: They have provided encouraging evidence that ML has a role https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118676684.ch5

15 [9] J. A. Cortes et al., Medium Access Control and Layers Above in PLC. [29] N. Strodthoff, B. Goktepe,¨ T. Schierl, C. Hellge, and W. Samek, John Wiley & Sons, Ltd, 2016, ch. 6, pp. 386–448. [Online]. Available: “Enhanced machine learning techniques for early HARQ feedback https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118676684.ch6 prediction in 5G,” CoRR, vol. abs/1807.10495, 2018. [Online]. [10] A. N. Milioudis, G. T. Andreou, and D. P. Labridis, “Enhanced Available: http://arxiv.org/abs/1807.10495 protection scheme for smart grids using power line communications [30] V. K. Tumuluru, P. Wang, and D. Niyato, “A novel spectrum-scheduling techniques-part I: Detection of high impedance fault occurrence,” IEEE scheme for multichannel cognitive radio network and performance Transactions on Smart Grid, vol. 3, no. 4, pp. 1621–1630, Dec 2012. analysis,” Vehicular Technology, IEEE Transactions on, vol. 60, pp. [11] A. M. Tonello, “Why you should consider the uptake of PLC tech,” 1849 – 1858, 06 2011. ESI Africa - Utility Maintenance Focus, vol. 3, pp. 34–36, July 2017. [31] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, [12] T. M. Mitchell, Machine Learning, 1st ed. New York, NY, USA: “Learning to optimize: Training deep neural networks for interference McGraw-Hill, Inc., 1997. management,” IEEE Transactions on Signal Processing, vol. 66, no. 20, [13] O. Simeone, “A very brief introduction to machine learning with pp. 5438–5453, Oct 2018. applications to communication systems,” CoRR, vol. abs/1808.02342, [32] D. Del Testa, M. Danieletto, G. M. Di Nunzio, and M. Zorzi, 2018. [Online]. Available: http://arxiv.org/abs/1808.02342 “Estimating the number of receiving nodes in 802.11 networks via [14] K. Hornik, M. B. Stinchcombe, and H. White, “Multilayer feedforward machine learning techniques,” in 2016 IEEE Global Communications networks are universal approximators,” Neural Networks, vol. 2, pp. Conference (GLOBECOM), Dec 2016, pp. 1–7. 359–366, 1989. [33] M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for [15] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Parallel distributed proactive caching in cloud-based radio access networks with mobile processing: Explorations in the microstructure of cognition, vol. 1,” users,” IEEE Transactions on Wireless Communications, vol. 16, no. 6, D. E. Rumelhart, J. L. McClelland, and C. PDP Research Group, pp. 3520–3535, June 2017. Eds. Cambridge, MA, USA: MIT Press, 1986, ch. Learning [34] Y. Wang, M. Martonosi, and L.-S. Peh, “A supervised learning Internal Representations by Error Propagation, pp. 318–362. [Online]. approach for routing optimizations in wireless sensor networks,” Available: http://dl.acm.org/citation.cfm?id=104279.104293 in Proceedings of the 2Nd International Workshop on Multi-hop [16] D. P. Kingma and J. Ba, “Adam: A method for stochastic Ad Hoc Networks: From Theory to Reality, ser. REALMAN ’06. optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: New York, NY, USA: ACM, 2006, pp. 79–86. [Online]. Available: http://arxiv.org/abs/1412.6980 http://doi.acm.org/10.1145/1132983.1132997 [35] T. T. T. Nguyen and G. Armitage, “A survey of techniques for internet [17] Y. Tao Ma, K. Hua Liu, and Y. Na Guo, “Artificial neural network traffic classification using machine learning,” IEEE Communications modeling approach to power-line communication multi-path channel,” Surveys Tutorials, vol. 10, no. 4, pp. 56–76, Fourth 2008. in 2008 International Conference on Neural Networks and Signal Processing, June 2008, pp. 229–232. [36] O. Simeone, “A Brief Introduction to Machine Learning for Engineers,” Foundations and Trends in Signal Processing, vol. 12, no. 3-4, pp. [18] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm 200–431, 2018. for optimal margin classifiers,” in Proceedings of the Fifth Annual [37] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Sil- Workshop on Computational Learning Theory, ser. COLT ’92. New verman, and A. Y. Wu, “An efficient k-means clustering algorithm: York, NY, USA: ACM, 1992, pp. 144–152. [Online]. Available: analysis and implementation,” IEEE Transactions on Pattern Analysis http://doi.acm.org/10.1145/130385.130401 and Machine Intelligence, vol. 24, no. 7, pp. 881–892, July 2002. [19] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., [38] M. Roux, “A comparative study of divisive and agglomerative vol. 20, no. 3, pp. 273–297, Sep. 1995. [Online]. Available: algorithms,” Journal of Classification, vol. 35, https://doi.org/10.1023/A:1022627411411 no. 2, pp. 345–366, Jul 2018. [Online]. Available: https://doi.org/10. [20] Y. Huo, G. Prasad, L. Atanackovic, L. Lampe, and V. C. M. Leung, 1007/s00357-018-9259-9 “Grid surveillance and diagnostics using power line communications,” [39] K. Bade and A. Nurnberger, “Personalized hierarchical clustering,” in in 2018 IEEE International Symposium on Power Line Communications 2006 IEEE/WIC/ACM International Conference on Web Intelligence and its Applications (ISPLC), April 2018, pp. 1–6. (WI 2006 Main Conference Proceedings)(WI’06), Dec 2006, pp. 181– [21] T. K. Ho, “Random decision forests,” in Proceedings of the Third 187. International Conference on Document Analysis and Recognition [40] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive (Volume 1) - Volume 1, ser. ICDAR ’95. Washington, DC, Datasets, 2nd ed. Cambridge University Press, 2014. USA: IEEE Computer Society, 1995, pp. 278–. [Online]. Available: [41] J. Vesanto and E. Alhoniemi, “Clustering of the self-organizing map,” http://dl.acm.org/citation.cfm?id=844379.844681 IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 586–600, [22] A. Zappone, M. D. Renzo, M. Debbah, T. L. Thanh, and May 2000. X. Qian, “Model-aided wireless artificial intelligence: Embedding [42] L. Dinh, D. Krueger, and Y. Bengio, “NICE: non-linear independent expert knowledge in deep neural networks towards wireless systems components estimation,” CoRR, vol. abs/1410.8516, 2014. [Online]. optimization,” CoRR, vol. abs/1808.01672, 2018. [Online]. Available: Available: http://arxiv.org/abs/1410.8516 http://arxiv.org/abs/1808.01672 [43] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using [23] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep real NVP,” CoRR, vol. abs/1605.08803, 2016. [Online]. Available: learning-based channel decoding,” in 51st Annual Conference on http://arxiv.org/abs/1605.08803 Information Sciences and Systems, CISS 2017, Baltimore, MD, [44] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting USA, March 22-24, 2017, 2017, pp. 1–6. [Online]. Available: and composing robust features with denoising autoencoders,” in https://doi.org/10.1109/CISS.2017.7926071 Proceedings of the 25th International Conference on Machine [24] H. Ye, G. Y. Li, and B. Juang, “Power of deep learning for channel Learning, ser. ICML ’08. New York, NY, USA: ACM, 2008, pp. estimation and signal detection in ofdm systems,” IEEE Wireless 1096–1103. [Online]. Available: http://doi.acm.org/10.1145/1390156. Communications Letters, vol. 7, no. 1, pp. 114–117, Feb 2018. 1390294 [25] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, [45] G. Alain, Y. Bengio, and S. Rifai, “Regularized auto-encoders estimate and Y. Beery, “Deep learning methods for improved decoding of linear local statistics,” CoRR, vol. abs/1211.4246, 2012. [Online]. Available: codes,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, http://arxiv.org/abs/1211.4246 no. 1, pp. 119–131, Feb 2018. [46] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contract- [26] A. Argyris, J. Bueno, and I. Fischer, “Photonic machine learning ing auto-encoders: Explicit invariance during feature extraction,” in implementation for signal recovery in optical communications,” Nature Proceedings of the Twenty-eight International Conference on Machine - Scientific Reports, vol. 8, no. 1, pp. 2045–2322, Fifth 2018. Learning (ICML’11, 2011. [27] N. E. West and T. O’Shea, “Deep architectures for modulation recog- [47] T. O’Shea and J. Hoydis, “An introduction to deep learning for the nition,” in 2017 IEEE International Symposium on Dynamic Spectrum physical layer,” IEEE Transactions on Cognitive Communications and Access Networks (DySPAN), March 2017, pp. 1–6. Networking, vol. 3, no. 4, pp. 563–575, Dec 2017. [28] S. Fang and T. Lin, “Indoor location system based on discriminant- [48] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” adaptive neural network in ieee 802.11 environments,” IEEE Transac- CoRR, vol. abs/1312.6114, 2013. [Online]. Available: http://arxiv.org/ tions on Neural Networks, vol. 19, no. 11, pp. 1973–1978, Nov 2008. abs/1312.6114

16 [49] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde- ceedings of the 1st Machine Learning for Healthcare Conference, ser. Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial Proceedings of Machine Learning Research, F. Doshi-Velez, J. Fackler, nets,” in Proceedings of the 27th International Conference on D. Kale, B. Wallace, and J. Wiens, Eds., vol. 56. Children’s Hospital Neural Information Processing Systems - Volume 2, ser. NIPS’14. LA, Los Angeles, CA, USA: PMLR, 18–19 Aug 2016, pp. 301–318. Cambridge, MA, USA: MIT Press, 2014, pp. 2672–2680. [Online]. [69] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by Available: http://dl.acm.org/citation.cfm?id=2969033.2969125 jointly learning to align and translate,” in 3rd International Conference [50] T. J. O’Shea, T. Roy, N. West, and B. C. Hilburn, “Physical layer com- on Learning Representations, ICLR 2015, San Diego, CA, USA, May munications system design over-the-air using adversarial networks,” 7-9, 2015, Conference Track Proceedings, 2015. 2018 26th European Signal Processing Conference (EUSIPCO), pp. [70] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural 529–532, 2018. Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: [51] B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. Bulow, http://dx.doi.org/10.1162/neco.1997.9.8.1735 D. Lavery, P. Bayvel, and L. Schmalen, “End-to-end deep learning [71] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- of optical fiber communications,” Journal of Lightwave Technology, works,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. vol. 36, no. 20, pp. 4843–4855, Oct 2018. 2673–2681, Nov 1997. [52] H. Kim, Y. Jiang, S. Kannan, S. Oh, and P. Viswanath, “Deepcode: [72] Z. C. Lipton, “A critical review of recurrent neural networks for Feedback codes via deep learning,” in Advances in Neural Information sequence learning,” CoRR, vol. abs/1506.00019, 2015. [Online]. Processing Systems 31: Annual Conference on Neural Information Pro- Available: http://arxiv.org/abs/1506.00019 cessing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montreal,´ [73] H. Salehinejad, J. Baarbe, S. Sankar, J. Barfett, E. Colak, and Canada., 2018, pp. 9458–9468. S. Valaee, “Recent advances in recurrent neural networks,” CoRR, vol. [53] M. Kim, N. Kim, W. Lee, and D. Cho, “Deep learning-aided scma,” abs/1801.01078, 2018. [Online]. Available: http://arxiv.org/abs/1801. IEEE Communications Letters, vol. 22, no. 4, pp. 720–723, April 2018. 01078 [54] C. Studer, S. Medjkouh, E. Gonultas, T. Goldstein, and O. Tirkkonen, [74] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust “Channel charting: Locating users within the radio environment using region policy optimization,” in Proceedings of the 32nd International channel state information,” IEEE Access, vol. 6, pp. 47 682–47 698, Conference on Machine Learning, ser. Proceedings of Machine 2018. Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille, [55] S. Dorner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learning France: PMLR, 07–09 Jul 2015, pp. 1889–1897. [Online]. Available: based communication over the air,” IEEE Journal of Selected Topics http://proceedings.mlr.press/v37/schulman15.html in Signal Processing, vol. 12, no. 1, pp. 132–143, Feb 2018. [75] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, [56] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learning D. Wierstra, and M. A. Riedmiller, “Playing atari with deep at physical layer without channel models,” IEEE Communications reinforcement learning,” CoRR, vol. abs/1312.5602, 2013. [Online]. Letters, vol. 22, no. 11, pp. 2278–2281, Nov 2018. Available: http://arxiv.org/abs/1312.5602 [57] F. A. Aoudia and J. Hoydis, “End-to-end learning of communications [76] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, systems without a channel model,” in 2018 52nd Asilomar Conference A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, on Signals, Systems, and Computers, Oct 2018, pp. 298–303. F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, [58] H. Ye, G. Y. Li, B. F. Juang, and K. Sivanesan, “Channel agnostic “Mastering the game of go without human knowledge,” Nature, vol. end-to-end learning based communication systems with conditional 550, pp. 354–359, 10 2017. GAN,” CoRR, vol. abs/1807.00447, 2018. [Online]. Available: [77] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, http://arxiv.org/abs/1807.00447 1st ed. Cambridge, MA, USA: MIT Press, 1998. [59] K. Davaslioglu and Y. E. Sagduyu, “Generative adversarial learning [78] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, for spectrum sensing,” in 2018 IEEE International Conference on “Deep reinforcement learning: A brief survey,” IEEE Signal Processing Communications (ICC), May 2018, pp. 1–6. Magazine, vol. 34, no. 6, pp. 26–38, Nov 2017. [60] U. Challita, L. Dong, and W. Saad, “Proactive resource management [79] T. J. O’Shea, J. Corgan, and T. C. Clancy, “Convolutional radio mod- for LTE in unlicensed spectrum: A deep learning perspective,” IEEE ulation recognition networks,” in Engineering Applications of Neural Transactions on Wireless Communications, vol. 17, no. 7, pp. 4674– Networks, C. Jayne and L. Iliadis, Eds. Cham: Springer International 4689, July 2018. Publishing, 2016, pp. 213–226. [61] A. A. Abbasi and M. F. Younis, “A survey on clustering algorithms [80] N. Bitar, S. Muhammad, and H. H. Refai, “Wireless technology for wireless sensor networks,” Computer Communications, vol. 30, pp. identification using deep convolutional neural networks,” in 2017 IEEE 2826–2841, 2007. 28th Annual International Symposium on Personal, Indoor, and Mobile [62] E. Abbe, A. S. Bandeira, and G. Hall, “Exact recovery in the Radio Communications (PIMRC), Oct 2017, pp. 1–6. stochastic block model,” CoRR, vol. abs/1405.3267, 2014. [Online]. [81] G. Kechriotis, E. Zervas, and E. S. Manolakos, “Using recurrent neural Available: http://arxiv.org/abs/1405.3267 networks for adaptive communication channel equalization,” IEEE [63] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Transactions on Neural Networks, vol. 5, no. 2, pp. 267–278, March learning applied to document recognition,” in Proceedings of the IEEE, 1994. 1998, pp. 2278–2324. [82] B. J. Radford, L. M. Apolonio, A. J. Trias, and J. A. Simpson, [64] Q. Li, W. Cai, X. Wang, Y. Zhou, D. D. Feng, and M. Chen, “Network traffic anomaly detection using recurrent neural networks,” “Medical image classification with convolutional neural network,” in CoRR, vol. abs/1803.10769, 2018. [Online]. Available: http://arxiv. 2014 13th International Conference on Control Automation Robotics org/abs/1803.10769 Vision (ICARCV), Dec 2014, pp. 844–848. [83] B. Nikfar, G. Bumiller, and A. J. H. Vinck, “An adaptive pursuit [65] R. Collobert and J. Weston, “A unified architecture for natural strategy for dynamic spectrum assignment in narrowband plc,” in 2017 language processing: Deep neural networks with multitask learning,” IEEE International Symposium on Power Line Communications and in Proceedings of the 25th International Conference on Machine its Applications (ISPLC), April 2017, pp. 1–6. Learning, ser. ICML ’08. New York, NY, USA: ACM, 2008, pp. 160– [84] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, 167. [Online]. Available: http://doi.acm.org/10.1145/1390156.1390177 Y. Liang, and D. I. Kim, “Applications of deep reinforcement [66] A. Graves, A. Mohamed, and G. E. Hinton, “Speech recognition with learning in communications and networking: A survey,” CoRR, vol. deep recurrent neural networks,” in IEEE International Conference on abs/1810.07862, 2018. [Online]. Available: http://arxiv.org/abs/1810. Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, 07862 BC, Canada, May 26-31, 2013, 2013, pp. 6645–6649. [Online]. [85] M. Tlich, A. Zeddam, F. Moulin, and F. Gauthier, “Indoor Power Line Available: https://doi.org/10.1109/ICASSP.2013.6638947 Communications channel characterisation up to 100 MHz - Part I: [67] A. Graves and J. Schmidhuber, “Offline handwriting recognition with One Parameter Deterministic Model,” IEEE Transactions on Power multidimensional recurrent neural networks,” in Advances in Neural In- Delivery, vol. 23, no. 3, pp. 1392–1401, 2008. formation Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, [86] S. Galli, “A simple two-tap statistical model for the power line and L. Bottou, Eds. Curran Associates, Inc., 2009, pp. 545–552. channel,” in ISPLC2010, March 2010, pp. 242–248. [68] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, “Doctor [87] J. A. Cortes, F. J. Canete, L. Diez, and J. L. G. Moreno, “On the ai: Predicting clinical events via recurrent neural networks,” in Pro- statistical properties of indoor power line channels: Measurements

17 and models,” in 2011 IEEE International Symposium on Power Line of Physiology-Heart and Circulatory Physiology, vol. 278, no. 6, pp. Communications and Its Applications, April 2011, pp. 271–276. H2039–H2049, 2000. [88] F. Versolatto and A. M. Tonello, “Plc channel characterization up to [108] P. D. A. Clifton, Machine Learning for Healthcare Technologies, 1st ed. 300 MHz: Frequency response and line impedance,” in 2012 IEEE IET, 2016. Global Communications Conference (GLOBECOM), Dec 2012, pp. [109] D. Righini and A. M. Tonello, “Automatic clustering of noise in multi- 3525–3530. conductor narrow band PLC channels,” in 2019 IEEE International [89] M. Babic, “Opera deliverable d5. pathloss as a function of frequency, Symposium on Power Line Communications and its Applications (IS- distance and network topology for various lv and mv european pow- PLC), April 2019. erline networks,” 2005. [110] B. Han, V. Stoica, C. Kaiser, N. Otterbach, and K. Dostert, “Noise [90] A. M. Tonello, F. Versolatto, and C. Tornelli, “Analysis of impulsive characterization and emulation for low-voltage power line channels uwb modulation on a real mv test network,” in 2011 IEEE International between 150 khz and 10 mhz,” Karlsruher Institut fur¨ Technologie Symposium on Power Line Communications and Its Applications, April (KIT), Tech. Rep., 2016. 2011, pp. 18–23. [111] D. Liang, H. Guo, and T. Zhang, “Impedance tracking and estimation [91] M. Mohammadi, L. Lampe, M. Lok, S. Mirabbasi, M. Mirvakili, using power line communications,” in 2019 IEEE International Sym- R. Rosales, and P. van Veen, “Measurement study and transmission posium on Power Line Communications and its Applications (ISPLC), for in-vehicle power line communication,” in 2009 IEEE International April 2019. Symposium on Power Line Communications and Its Applications, [112] M. D. Piante and A. M. Tonello, “Channel and admittance dependence March 2009, pp. 73–78. analysis in PLC with implications to power loading/savings,” in 2019 [92] M. Antoniali, M. De Piante, and A. M. Tonello, “Plc noise and IEEE International Symposium on Power Line Communications and channel characterization in a compact electrical car,” in 2013 IEEE its Applications (ISPLC), April 2019. 17th International Symposium on Power Line Communications and Its [113] F. Passerini and A. M. Tonello, “Physical layer key generation Applications, March 2013, pp. 29–34. for secure power line communications,” CoRR, vol. abs/1809.09439, [93] M. Antoniali, A. M. Tonello, M. Lenardon, and A. Qualizza, “Mea- 2018. [Online]. Available: http://arxiv.org/abs/1809.09439 surements and analysis of plc channels in a cruise ship,” in 2011 [114] A. Tonello and F. Versolatto, “Bottom-up statistical PLC channel IEEE International Symposium on Power Line Communications and modelling - part I: Random topology model and efficient transfer Its Applications, April 2011, pp. 102–107. function computation,” IEEE Transactions on Power Delivery, vol. 26, [94] P. Pagani, R. Hashmat, A. Schwager, D. Schneider, and W. Baschlin, no. 2, pp. 891–898, Apr. 2011. “European MIMO PLC field measurements: Noise analysis,” in 2012 [115] ——, “Bottom-up statistical PLC channel modelling - part II: Inferring IEEE International Symposium on Power Line Communications and the statistics,” IEEE Transactions on Power Delivery, vol. 25, no. 4, Its Applications, March 2012, pp. 310–315. pp. 2356–2363, Oct. 2010. [95] A. Pittolo, M. De Piante, F. Versolatto, and A. M. Tonello, “In-vehicle [116] M. Zimmermann and K. Dostert, “A multipath model for the powerline power line communication: Differences and similarities among the in- channel,” IEEE Transactions on Communications, vol. 50, no. 4, pp. car and the in-ship scenarios,” IEEE Vehicular Technology Magazine, 553–559, 2002. vol. 11, no. 2, pp. 43–51, June 2016. [117] A. Pittolo and A. M. Tonello, “A synthetic statistical mimo plc [96] S. Galli, “A simplified model for the indoor power line channel,” channel model applied to an in-home scenario,” IEEE Transactions in IEEE International Symposium on Power Line Communications, on Communications, vol. 65, no. 6, pp. 2543–2553, June 2017. Dresden (DE), 2009. et al. Electromagnetic Compatibility [97] A. Tonello, F. Versolatto, and A. Pittolo, “In-home power line com- [118] H. Hirsch , . John Wiley munication channel: Statistical characterization,” IEEE Transactions on & Sons, Ltd, 2016, ch. 3, pp. 178–222. [Online]. Available: Communications, vol. 62, no. 6, pp. 2096–2106, 2014. https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118676684.ch3 [98] F. J. C. Corripio, J. A. C. Arrabal, L. D. del Rio, and J. T. E. [119] D. Righini, F. Passerini, and A. M. Tonello, “Modeling transmis- Munoz, “Analysis of the cyclic short-term variation of indoor power sion and radiation effects when exploiting power line networks for line channels,” IEEE Journal on Selected Areas in Communications, communication,” IEEE Transactions on Electromagnetic Compatibility, vol. 24, no. 7, pp. 1327–1338, July 2006. vol. 60, no. 1, pp. 59–67, Feb 2018. [99] L. T. Berger, A. Schwager, P. Pagani, and D. M. Schneider, “Mimo [120] S. Katar, B. Mashburn, K. Afkhamie, H. Latchman, and R. Newman, power line communications,” IEEE Communications Surveys Tutorials, “Channel adaptation based on cyclo-stationary noise characteristics in vol. 17, no. 1, pp. 106–124, Firstquarter 2015. plc systems,” in 2006 IEEE International Symposium on Power Line [100] D. Righini and A. M. Tonello, “Noise determinism in multi-conductor Communications and Its Applications, March 2006, pp. 16–21. narrow band plc channels,” in 2018 IEEE International Symposium on [121] J.-Y. Baudais and M. Crussiere,` “Linear precoding for multicarrier and Power Line Communications and its Applications (ISPLC), April 2018, multicast PLC,” in MIMO Power Line Communications: Narrow and pp. 1–6. Broadband Standards, EMC, and Advanced Processing, ser. Devices, [101] M. Zimmermann and K. Dostert, “Analysis and modeling of impulsive Circuits, and Systems, L. Berger, A. Schwager, P. Pagani, and noise in broad-band powerline communications,” IEEE Transactions D. Schneider, Eds. CRC Press, Feb. 2014, pp. 493–530. [Online]. on Electromagnetic Compatibility, vol. 44, no. 1, pp. 249–258, Feb Available: https://hal.archives-ouvertes.fr/hal-00927043 2002. [122] M. J. Rahman and L. H.-J. Lampe, “Interference alignment for mimo [102] N. Shlezinger and R. Dabora, “On the capacity of narrowband plc power line communications,” 2015 IEEE International Symposium on channels,” IEEE Transactions on Communications, vol. 63, no. 4, pp. Power Line Communications and Its Applications (ISPLC), pp. 71–76, 1191–1201, April 2015. 2015. [103] A. Pittolo, A. M. Tonello, and F. Versolatto, “Performance of MIMO [123] M. Sayed, G. Sebaali, B. L. Evans, and N. Al-Dhahir, “Efficient PLC in measured channels affected by correlated noise,” in 18th diversity technique for hybrid narrowband-powerline/wireless smart IEEE International Symposium on Power Line Communications and grid communications,” in 2015 IEEE International Conference on Its Applications, Glasgow, Scottland, March 2014, pp. 261–265. Smart Grid Communications (SmartGridComm), Nov 2015, pp. 1–6. [104] M. Antoniali, F. Versolatto, and A. M. Tonello, “An experimental [124] M. Abadi et al., “Tensorflow: Large-scale machine learning on characterization of the PLC noise at the source,” IEEE Transactions heterogeneous distributed systems,” CoRR, vol. abs/1603.04467, 2016. on Power Delivery, vol. 31, no. 3, pp. 1068–1075, June 2016. [Online]. Available: http://arxiv.org/abs/1603.04467 [105] M. Elgenedy, M. Sayed, N. Al-Dhahir, and R. C. Chabaan, “Cyclo- [125] A. Tonello, Wide band impulse modulation and receiver algorithms for stationary noise mitigation for simo powerline communications,” IEEE multiuser power line communications. EURASIP Journal on Advances Access, vol. 6, pp. 5460–5484, 2018. in Signal Processing, 2007. [106] G. J. Szekely,´ M. L. Rizzo, and N. K. Bakirov, “Measuring and [126] G. Bumiller, L. Lampe, and H. Hrasnica, “Power line communication testing dependence by correlation of distances,” The Annals of networks for large-scale control and automation systems,” IEEE Com- Statistics, vol. 35, no. 6, pp. 2769–2794, 12 2007. [Online]. Available: munications Magazine, vol. 48, no. 4, pp. 106–113, April 2010. https://doi.org/10.1214/009053607000000505 [127] S. D’Alessandro and A. M. Tonello, “On rate improvements and [107] J. S. Richman and J. R. Moorman, “Physiological time-series analysis power saving with opportunistic relaying in home power line using approximate entropy and sample entropy,” American Journal networks,” EURASIP Journal on Advances in Signal Processing,

18 vol. 2012, no. 1, p. 194, Sep 2012. [Online]. Available: https: on Systems, Communications and Coding, Feb 2017, pp. 1–6. //doi.org/10.1186/1687-6180-2012-194 [136] ——, “Artificial-intelligence-based performance enhancement of the [128] A. Bradai, K. Singh, T. Ahmed, and T. Rasheed, “Cellular software g3-plc loadng routing protocol for sensor networks,” in 2019 IEEE defined networking: a framework,” IEEE Communications Magazine, International Symposium on Power Line Communications and its vol. 53, no. 6, pp. 36–43, June 2015. Applications (ISPLC), April 2019. [129] S. Khodayari and M. J. Yazdanpanah, “Network routing based on [137] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and reinforcement learning in dynamically changing networks,” in 17th R. Salakhutdinov, “Dropout: A simple way to prevent neural networks IEEE International Conference on Tools with Artificial Intelligence from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929– (ICTAI’05), Nov 2005, pp. 5 pp.–366. 1958, Jan. 2014. [Online]. Available: http://dl.acm.org/citation.cfm? [130] Y. Lv and Y. Zhu, “Routing algorithm based on swarm intelligence,” id=2627435.2670313 Proceedings of the 10th World Congress on Intelligent Control and in [138] M. O. Ahmed and L. Lampe, “Power line communications for low- Automation , July 2012, pp. 47–50. voltage power grid tomography,” IEEE Transactions on Communica- [131] G. Kumar and J. Singh, “Energy efficient clustering scheme based tions, vol. 61, no. 12, pp. 5163–5175, Dec. 2013. on grid optimization using genetic algorithm for wireless sensor networks,” in 2013 Fourth International Conference on Computing, [139] F. Passerini and A. M. Tonello, “On the exploitation of admittance IEEE Transac- Communications and Networking Technologies (ICCCNT), July 2013, measurements for wired network topology derivation,” tions on Instrumentation and Measurement pp. 1–5. , vol. 66, no. 3, pp. 374–382, [132] N. Guizani and J. Fayn, “Framework for intelligent message routing March 2017. policy adaptation,” in 2015 IEEE International Conference on Auto- [140] ——, “Smart grid monitoring using power line modems: Anomaly nomic Computing, July 2015, pp. 235–238. detection and localization,” IEEE Transactions on Smart Grid, pp. 1–1, [133] K.-L. Yau, H. G. Goh, D. Chieng, and K. Hsiang Kwong, “Application 2019. of reinforcement learning to wireless sensor networks: models and [141] ——, “Smart grid monitoring using power line modems: Effect of algorithms,” Computing, vol. 97, 11 2015. anomalies on signal propagation,” IEEE Access, vol. 7, pp. 27 302– [134] F. Marcuzzi and A. M. Tonello, “Artificial intelligence based routing in 27 312, 2019. plc networks,” in 2018 IEEE International Symposium on Power Line [142] L. Forstel and L. Lampe, “Grid diagnostics: Monitoring cable aging Communications and its Applications (ISPLC), April 2018, pp. 1–6. using power line transmission,” in 2017 IEEE International Symposium [135] ——, “Stochastic geometry for the analysis of small radio cells and on Power Line Communications and its Applications (ISPLC), April plc back-hauling,” in SCC 2017; 11th International ITG Conference 2017, pp. 1–6.

19