<<

PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 1 Learning Power Spectrum Maps from Quantized Power Measurements Daniel Romero, Member, IEEE, Seung-Jun Kim, Senior Member, IEEE, Georgios B. Giannakis, Fellow, IEEE, and Roberto Lopez-Valcarce,´ Member, IEEE

Abstract—Power spectral density (PSD) maps providing the techniques, which assume a common spectrum occupancy distribution of RF power across space and frequency are con- over the entire sensed region [28], [3], [25], PSD cartography structed using power measurements collected by a network accounts for variability across space and therefore enables a of low-cost sensors. By introducing linear compression and quantization to a small number of bits, sensor measurements can more aggressive spatial reuse. be communicated to the fusion center with minimal bandwidth A number of interpolation and regression techniques have requirements. Strengths of data- and model-driven approaches been applied to construct RF power maps from power mea- are combined to develop estimators capable of incorporating mul- surements. Examples include kriging [1], compressive sam- tiple forms of spectral and propagation prior information while pling [19], dictionary learning [23], [22], sparse Bayesian fitting the rapid variations of shadow fading across space. To this end, novel nonparametric and semiparametric formulations learning [18], and matrix completion [15]. These maps de- are investigated. It is shown that PSD maps can be obtained scribe how power distributes over space but not over frequency. using support vector machine-type solvers. In addition to batch To accommodate variability along the frequency dimension approaches, an online algorithm attuned to real-time operation as well, a basis expansion model was introduced in [6], is developed. Numerical tests assess the performance of the novel [13], [8] to create PSD maps from periodograms. To alleviate algorithms. the high power consumption and bandwidth needs that stem from obtaining and communicating these periodograms, [25] I.INTRODUCTION proposed a low-overhead sensing scheme based on single-bit Power spectral density (PSD) cartography relies on sensor data along the lines of [29]. However, this scheme assumes measurements to map the radiofrequency (RF) signal power that the PSD is constant across space. distribution over a geographical region. The resulting maps To summarize, existing spectrum cartography approaches are instrumental for various network management either construct power maps from power measurements, or, tasks, such as power control, interference mitigation, and PSD maps from PSD measurements. In contrast, the main planning [21], [12]. For instance, PSD maps benefit wireless contribution of this paper is to present algorithms capable network planning by revealing the location of crowded regions of estimating PSD maps from power measurements, thus and areas of weak coverage, thereby suggesting where new attaining a more efficient extraction of the information con- base stations should be deployed. Because they characterize tained in the observations than existing methods. Therefore, how RF power distributes per channel across space, PSD the proposed approach enables the estimation of the RF power maps are also useful to increase frequency reuse and mitigate distribution over frequency and space using low-cost low- interference in cellular systems. In addition, PSD maps en- power sensors since only power measurements are required. able opportunistic transmission in cognitive systems by To facilitate practical implementations with sensor net- unveiling underutilized “” in time, space, and fre- works, where the communication bandwidth is limited, over- arXiv:1606.02679v2 [cs.IT] 5 Mar 2017 quency [5], [42]. Different from conventional spectrum sensing head is reduced by adopting two measures. First, sensor measurements are quantized to a small number of bits. Second, D. Romero and G. B. Giannakis were supported by ARO grant W911NF- the available prior information is efficiently captured in both 15-1-0492 and NSF grant 1343248. S.-J. Kim was supported by NSF grant 1547347. D. Romero and R. Lopez-Valcarce´ were supported by the Spanish frequency and spatial domains, thus affording strong quantiza- Ministry of Economy and Competitiveness and the European Regional De- tion while minimally sacrificing the quality of map estimates. velopment Fund (ERDF) (projects TEC2013-47020-C2-1-R, TEC2016-76409- Specifically, a great deal of frequency-domain prior in- C2-2-R and TEC2015-69648-REDC), and by the Galician Government and ERDF (projects GRC2013/009, R2014/037 and ED431G/04). formation about impinging communication waveforms can D. Romero is with the Dept. of Information and Communication Tech- be collected from spectrum regulations and standards, which nology, Univ. of Agder, Norway. S.-J. Kim is with the Dept. of Computer specify bandwidth, carrier frequencies, transmission masks, Sci. & Electrical Eng., Univ. of Maryland, Baltimore County, USA. D. Romero was and G. B. Giannakis is with the Dept. of ECE and Digital Tech. roll-off factors, number of sub-carriers, and so forth [38], Center, Univ. of Minnesota, USA. D. Romero was and R. Lopez-Valcarce´ [31]. To exploit this information, a basis expansion model is with the Dept. of Signal Theory and Communications, Univ. of Vigo, is adopted, which allows the estimation of the power of Spain. E-mails: [email protected], [email protected], [email protected], [email protected]. each sub-channel and background noise as a byproduct. The Parts of this work have been presented at the IEEE International Conference resulting estimates can be employed to construct signal-to- on Acoustics, Speech, and Signal Processing, Brisbane (Australia), 2015; at noise ratio (SNR) maps, which reveal weak coverage areas, the Conference on Information Sciences and Systems, Baltimore (Maryland), 2015; and at the IEEE International Workshop on Computational Advances and alleviate the well-known noise uncertainty problem in in Multi-sensor Adaptive Processing, Cancun´ (Mexico),´ 2015. cognitive radio [36]. PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 2

To incorporate varying degrees of prior information in the πˆ(xn) spatial domain, nonparametric and semiparametric estimators g(x , t) Power n Meas. are developed within the framework of kernel-based learning received to fusion signal center for vector-valued functions [26]. Nonparametric estimates are well suited for scenarios where no prior knowledge about the πˆ(xn) πˆQ(xn) propagation environment is available due to their ability to ap- Power proximate any spatial field with arbitrarily high accuracy [11]. g(xn, t) Quantizer received Meas. to fusion In many cases, however, one may approximately know the signal center locations, the path loss exponent, or even the loca- Fig. 1: Sensor architectures without (top) and with quantization tions of obstacles or reflectors. The proposed semiparametric (bottom). estimators capture these forms of prior information through a basis expansion in the spatial domain. Although the proposed estimators can be efficiently imple- II.SYSTEM MODELAND PROBLEM STATEMENT mented in batch mode, limited computational resources may Consider M−1 located in a geographical region d 1 hinder real-time operation if an extensive set of measurements R ⊂ R , where d is typically 2. Let φm(f) denote the is to be processed. This issue is mitigated here through an transmit-PSD of the m-th transmitter and let lm(x) represent online nonparametric estimation algorithm based on stochastic the gain of the channel between the m-th transmitter and gradient descent. Remarkably, the proposed algorithm can location x ∈ R, which is assumed frequency flat to simplify also be applied to (vector-valued) function estimation setups the exposition; see Remark 1. If the M −1 transmitted signals besides spectrum cartography. are uncorrelated, the PSD at location x is given by

M The present paper also contains two theoretical contribu- X tions to machine learning. First, a neat connection is estab- Γ(x, f) = lm(x)φm(f) (1) lished between robust (possibly vector-valued) function esti- m=1 mation from quantized measurements and support vector ma- where lM (x) is the noise power and φM (f) is the noise PSD, chines (SVMs) [32], [34], [35]. Through this link, theoretical R ∞ normalized to −∞ φM (f)df = 1. In view of (1), one can guarantees and efficient implementations of SVMs permeate also normalize φm(f), m = 1,...,M − 1, without any loss to the proposed methods. Second, the theory of kernel-based R ∞ of generality to satisfy −∞ φm(f)df = 1 by absorbing any learning for vector-valued functions is extended to accom- scaling factor into lm(x). Since such a scaling factor equals modate semiparametric estimation. The resulting methods are the transmit power, the coefficient lm(x) actually represents of independent interest since they can be applied beyond the power of the m-th signal at location x. the present spectrum cartography context and subsume, as a M−1 Often in practice, the normalized PSDs {φm(f)}m=1 are special case, thin-plate splines regression [40], [8]. approximately known since transmitters typically adhere to The rest of the paper is organized as follows. Sec. II presents publicly available standards and regulations, which prescribe the system model and formulates the problem. Sec. III pro- spectral masks, bandwidths, carrier frequencies, roll-off fac- poses nonparametric and semiparametric PSD map estimation tors, number of pilots, and so on [38], [31]. If unknown, the { }M−1 algorithms operating in batch mode, whereas Sec. IV develops methods here carry over after setting φm(f) m=1 to be a an online solver. Finally, Sec. V presents some numerical tests frequency-domain basis expansion model [6], [8]. For this { }M and Sec. VI concludes the paper with closing remarks and reason, the rest of the paper assumes that φm(f) m=1 are research directions. known. The goal is to estimate the PSD map Γ(x, f) from the mea- Notation. A |A| N The cardinality of set is denoted by . Scalars surements gathered by N sensors with locations {xn}n=1 ⊂ are denoted by lowercase letters, column vectors by boldface R. In view of (1), this task is tantamount to estimating M lowercase letters, and matrices by boldface uppercase letters. {lm(x)}m=1 at every spatial coordinate x. Superscript T stands for transposition, and H for conjugate To minimize hardware costs and power consumption, this transposition. The (i, j)th entry (jth column) of matrix A is paper adopts the sensor architecture in Fig. 1. The ensem- denoted by ai,j (aj). The Kronecker product is represented ble power at the output of the filter of the n-th sensor by the symbol ⊗. The Khatri-Rao product is defined for A := R ∞ 2 is π(xn) = −∞ |G(xn, f)| Γ(xn, f)df, where G(xn, f) M1×N [a1, a2,..., aN ] ∈ C and B := [b1, b2,..., bN ] ∈ denotes the frequency response of the receive filter. From (1), M2×N M1M2×N C as A B := [a1 ⊗ b1,..., aN ⊗ bN ] ∈ C . it follows that The entrywise or Hadamard product is defined as (A◦B)i,j := M . Vector is the -th column of the × X T ai,jbi,j eM,m m M M π(xn) = lm(xn)Φm(xn) = φ (xn)l(xn) (2) identity matrix IM , whereas 0M and 1M are the vectors m=1 of dimension M with all zeros and ones, respectively. Sym- ∞ where Φ (x ) := R |G(x , f)|2φ (f)df can be thought bol {·} denotes expectation, {·} probability, Tr (·) trace, m n −∞ n m E P of as the contribution of the m-th transmitter per unit of λmax(·) largest eigenvalue, and ? convolution. Notation dρe (alternatively bρc) represents the smallest (largest) integer z 1One may set d = 1 for maps along roads or railways, or even d = 3 for satisfying z ≥ ρ (z ≤ ρ). applications involving aerial vehicles or urban environments. PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 3

lm(xn) to the power at the output of the receive filter of the matrix of the uncompressed data blocks, and Σm the co- T n-th sensor; whereas φ(xn) := [Φ1(xn),..., ΦM (xn)] and variance matrix of the blocks transmitted from the m-th T l(xn) := [l1(xn), . . . , lM (xn)] are introduced to simplify transmitter. Combining both equalities yields (2) upon defining T  notation. The n-th sensor obtains an estimate πˆ(xn) of π(xn) Φm(xn) := Tr G(xn)ΣmG (xn) . by measuring the signal power at the output of the filter over a certain time window. In general, πˆ(xn) =6 π(xn) due to the III.LEARNING PSDMAPS finite length of this observation window. This section develops various PSD map estimators offering Different from the measurement model in [6], [13], [8], different bandwidth-performance trade-offs. First, Sec. III-A where sensors obtain and communicate periodograms, the puts forth a nonparametric estimator to recover l(x) := proposed scheme solely involves power measurements, thereby T [l1(x), . . . , lM (x)] from un-quantized power measurements. reducing sensor costs and bandwidth requirements. To further To reduce bandwidth requirements, this approach is extended reduce bandwidth, πˆ(xn) can be quantized as illustrated in the in Sec. III-B to accommodate quantized data. The detrimental bottom part of Fig. 1. When uniform quantization is used, the impact of strong quantization on the quality of map estimates sensors obtain is counteracted in Sec. III-C by leveraging propagation prior information. For simplicity, these methods are presented for πˆ (x ) := Q(ˆπ(x )) := bπˆ(x )/(2)c , n = 1,...,N Q n n n the scenario where each sensor obtains a single measurement, (3) whereas general versions accommodating multiple measure- where 2 is the quantization step; see also Remark 5. If R ments per sensor are outlined in Sec. III-E. denotes the number of quantization levels, which depends on  and the range of πˆ(xn), the number of bits needed is now A. Estimation via nonparametric regression just dlog Re. Depending on how accurate πˆ(x ) is, either 2 n This section reviews the background in kernel-based learn- Q(π(x )) = Q(ˆπ(x )) or Q(π(x )) =6 Q(ˆπ(x )). The latter n n n n ing necessary to develop the cartography tools in the rest of event is termed measurement error and is due to the finite the paper and presents an estimator to obtain PSD maps from length of the aforementioned time window. un-quantized measurements. Finally, the sensors communicate the measurements N N Kernel-based regression seeks estimates among a wide class {πˆ(xn)}n=1 or {πˆQ(xn)}n=1 to the fusion center. Given M of functions termed reproducing kernel Hilbert space (RKHS). these measurements, together with {φm(f)}m=1 and N M In the present setting of vector-valued functions, such an {x } , the problem is to estimate {l (x)} at ∞ n n=1 m m=1 RKHS is given by H := {l(x) = P K(x, x˜ )c˜ : every x ∈ R. The latter can be viewed individually as M n=1 n n x˜ ∈ d, c˜ ∈ M } [26], [11], where {c˜ }∞ are M × 1 functions of the spatial coordinate x, or, altogether as a vector n R n R n n=1 expansion coefficient vectors and K(x, x0) is termed repro- field l : d → M , where l(x) := [l (x), . . . , l (x)]T . R R 1 M ducing kernel map. The latter is any matrix-valued function Thus, estimating the PSD map in (1) is in fact a problem of K(x, x0): d × d → M×M that is (i) symmetric, estimating a vector-valued function. R R R meaning that K(x, x0) = K(x0, x) for any x and x0; and (ii) Remark 1 (Frequency-selective channels). If the channels are positive (semi)definite, meaning that the square matrix having 0 not frequency-flat, then each term in the sum of (1) can be K(x˜n, x˜n0 ) as its (n, n ) block is positive semi-definite for decomposed into multiple components of smaller bandwidth any {x˜1,..., x˜N˜ }. Remark 4 guides the selection of functions in such a way that each one sees an approximately frequency- K(x, x0) qualifying as reproducing kernels. flat channel. To ensure that these components are uncorrelated, As any Hilbert space, an RKHS has an associated inner one can choose their frequency supports to be disjoint. product, not necessarily equal to the classical hf, gi = R f T (x)g(x)dx. Specifically, the inner product between two Remark 2. Sensors can also operate digitally. In this case, PN˜ 0 RKHS functions l(x) := n=1 K(x, x˜n)c˜n and l (x) := one could implement the receive filters to have pseudo-random 0 PN˜ 0 0 impulse responses [25]. Selecting distinct seeds for the random n=1 K(x, x˜n)c˜n can be obtained through the reproducing number generators of different sensors yields linearly inde- kernel as N ˜ ˜ 0 pendent {φ(xn)}n=1 with a high probability, which ensures N N M 0 X X T 0 0 identifiability of {lm(xn)}m=1 (cf. (2)). hl, l iH = c˜n K(x˜n, x˜n0 )c˜n0 . (4) n=1 n0=1 Remark 3. If a wideband map is to be constructed, then This expression is referred to as the reproducing property and Nyquist-rate sampling may be too demanding for low-cost is of paramount importance since it allows the computation sensors. In this scenario, one can replace the filter in Fig. 1 of function inner products without integration. From (4), the 2 with an analog-to-information-converter (A2IC) [37], [31]. To induced RKHS norm of l can be written as klkH := hl, li = ∞ ∞ T see that (2) still holds and therefore the proposed schemes P P 0 0 n=1 n0=1 c˜n K(x˜n, x˜n )c˜n and is widely used as a proxy still apply, let G(xn) represent the compression matrix of the for smoothness of l. nth sensor, which multiplies raw measurement blocks to yield Kernel-based methods confine their search of estimates to compressed data blocks [31]. The ensemble power of the latter functions in H, which is not a limitation since an extensive T  is proportional to π(xn) := Tr G(xn)Σ(xn)G (xn) , class of functions, including any continuous function vanishing PM where Σ(xn) = m=1 lm(xn)Σm denotes the covariance at infinity, can be approximated with arbitrary accuracy by a PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 4 function in H for a properly selected kernel [11]. However, transmit periodograms to the fusion center. This necessitates function estimation is challenging since (i) any finite set of a higher communication bandwidth, longer sensing time, and samples generally admits infinitely many interpolating func- more costly sensors than required here [6], [13], [8]. tions in H and (ii) the estimate ˆl does not generally approach Remark 4. The choice of the kernel considerably affects the estimated function l even for an arbitrary large number the estimation performance when the number of observations of noisy samples if the latter are overfitted. To mitigate both is small. Thus, it is important to choose a kernel that is issues, kernel regression seeks estimates minimizing the sum well-suited to the spatial variability of the true l. To do of two terms, where the first penalizes estimates deviating from so, one may rely on cross validation, historical data [32, the observations and the second promotes smoothness. Specif- Sec. 2.3], or multi-kernel approaches [7]. Although specifying ically, if L(e ) denotes a loss function of the measurement n matrix-valued kernels is more challenging than specifying error e :=π ˆ(x ) − φT (x )l(x ), the nonparametric kernel- n n n n their scalar counterparts (M = 1) [26], a simple but effective based regression estimate of l is [9] possibility is to construct a diagonal kernel as K(x, x0) = N 0 0 0 M diag(k1(x, x ), . . . , k (x, x )) where {k (x, x )} are ˆ 1 X T  2 M m m=1 l := arg min L πˆ(xn) − φ (xn)l(xn) + λklkH valid scalar kernels. For example, k (x, x0) can be the l∈H N m n=1 0 0 2 2 (5) popular Gaussian kernel km(x, x ) = exp(−kx − x k /σm), where 2 is user selected. where the user-selected scalar λ > 0 controls the tradeoff σm > 0 between fitting the data and smoothness of the estimate, captured by its RKHS norm. B. Nonparametric regression from quantized data Since H is infinite-dimensional, solving (5) directly is The scheme in Sec. III-A offers a drastic bandwidth re- generally not possible with a finite number of operations. duction relative to competing PSD map estimators since only Fortunately, the so-called representer theorem (see e.g., [32], scalar-valued measurements need to be communicated to the [2]) asserts that ˆl in (5) is of the form fusion center. The methods in this section accomplish a further N reduction by accommodating quantized measurements. X Recall from Sec. II, that { }N denote the result of K(x, xn)cn := K(x)c (6) πˆQ(xn) n=1 N n=1 uniformly quantizing the power measurements {πˆ(xn)}n=1; see also Remark 5. The former essentially convey interval for some {c }N , where K(x) := n n=1 information about the latter, since (3) implies that πˆ(x ) [K(x, x ),..., K(x, x )] is of size M × MN, and n 1 N is contained in the interval [y(x ) − , y(x ) + ), where c := [cT ,..., cT ]T is MN × 1. In words, the solution to n n 1 N y(x ) := [2ˆπ (x ) + 1]. Note that y(x ) is in fact the (5) admits a kernel expansion around the sensor locations n Q n n centroid of the πˆ (x )-th quantization interval. {x }N . From (6), it follows that finding ˆl amounts to Q n n n=1 To account for the uncertainty within such an interval, one finding c, but the latter can easily be obtained by solving the can replace πˆ(x ) in (5) with y(x ) as problem that results from substituting (6) into (5): n n N N ˆ 1 X T 2 X T T l = arg min L(y(xn) − φ (xn)l(xn)) + λklkH (9) cˆ := arg min L(ˆπ(xn) − φ (xn)K(xn)c) + λNc Kc. l∈H N c n=1 n=1 (7) and set L to assign no cost across all candidate functions l that T In (7), the MN × MN matrix K is formed to have lead to values of φ (xn)l(xn) = π(xn) falling ± around 0 K(xn, xn0 ) as its (n, n )-th block. Clearly, the minimizer of y(xn). In other words, such an L only penalizes functions ˆ T (5) can then be recovered as l(x) = K(x)cˆ. l for which en = y(xn) − φ (xn)l(xn) falls outside of The loss function L is typically chosen to be convex. [−, ). Examples of these -insensitive loss functions include 2 The simplest example is the squared loss L2(en) := en, L1(en) := max(0, |en| − ) [32], [34], and the less known ˆ 2 with the resulting l referred to as the kernel ridge regres- L2(en) := max(0, en − ). Incidentally, these functions T sion estimate. Defining πˆ := [ˆπ(x1),..., πˆ(xN )] , Φ := endow the proposed estimators with robustness to outliers and N [φ(x1),..., φ(xN )], and Φ0 := IN Φ, expression (7) promote sparsity in {en}n=1, which is a particularly well- becomes motivated property when the number of measurement errors T 2 T is small relative to N, that is, when Q(π(xn)) = Q(ˆπ(xn)) cˆ = arg min ||πˆ − Φ0 Kc||2 + λNc Kc MN for most values of n. c∈R T −1 The rest of this section develops solvers for (9) and es- = (Φ0Φ K + λNIMN ) Φ0πˆ. (8) 0 tablishes a link between (9) and SVMs. To this end, note Besides its simplicity of implementation, the estimate that application of the representer theorem to (9) yields, as ˆl(x) = K(x)cˆ with cˆ as in (8) offers a twofold advantage over in Sec. III-A, an estimate ˆl(x) = K(x)cˆ with existing cartography schemes. First, existing estimators relying N on power measurements can only construct power maps [1], X T T cˆ = arg min L(y(xn) − φ (xn)K(xn)c) + λNc Kc. [19], [18], [23], [22], whereas the proposed method is capable c n=1 of obtaining PSD maps from the same measurements. On the (10) other hand, existing methods for estimating PSD maps require Now focus on L1 and note that L1(en) = ξn + ζn, where PSD measurements, that is, every sensor must obtain and ξn := max(0, en − ) and ζn := max(0, −en − ) respectively PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 5

quantify positive deviations of en with respect to the right Algorithm 1: Nonparametric batch PSD map estimator and left endpoints of [−, ). This implies that ξn satisfies N M 1: Input: {(xn, φ(xn), πˆQ(xn))}n=1, {φm(f)}m=1 ,  ≥ − and ≥ , whereas satisfies ≥ − − 0 ξn en  ξn 0 ζn ζn en  2: Parameters: λ, K(x, x ) and ζn ≥ 0, thus establishing the following result. 3: Φ = [φ(x1),..., φ(xN )] Proposition 1. The problem in (10) with L the -insensitive 4: Φ0 = IN Φ 0 0 loss function L1 can be expressed as 5: Form K, whose (n, n )-th block is K(xn, xn ) 6: T N K0 = Φ0 KΦ0 T X T 7: y = [y(x1), . . . , y(xN )] (cˆ, ξˆ,ζˆ) = arg min (ξn + ζn) + λNc Kc (11) c ξ ζ 8: ˆ , , n=1 Obtain (αˆ , β) from (12) T T T ˆ T 9: [cˆ1 ,..., cˆN ] = [1/(2λN)]Φ0(αˆ − β) s.to ξn ≥ y(xn) − φ (xn)K(xn)c −  , ξn ≥ 0, M 10: ˆ P ˆ T Output: Function Γ(x, f) = m=1 lm(x)φm(f), where ζn ≥ −y(xn) + φ (xn)K(xn)c − , ζn ≥ 0, ˆ ˆ T PN [l1(x),..., lM (x)] = n=1 K(x, xn)cˆn. n = 1,...,N T T where ξ := [ξ1, . . . , ξN ] and ζ := [ζ1, . . . , ζN ] . ζˆ = max(0 , −y + ΦT Kcˆ − 1 ). (13c) Problem (11) is a convex quadratic program with slack vari- N 0 N N ˆ ˆ ables {ξn, ζn}n=1. Although one can obtain (cˆ, ξ, ζ) using, Algorithm 1 lists the steps involved in the proposed nonpara- for example, an off-the-shelf interior-point solver, it will be metric estimator. Note that the primal (11) entails (M + 2)N shown that a more efficient approach is to solve the dual- variables whereas the dual (12) has just 2N. The latter can be domain version of (11). solved using sequential minimal optimization algorithms [27], To the best of our knowledge, (11) constitutes the first which here can afford simplified implementation along the application of an -insensitive loss to estimating functions lines of e.g., [20] because there is no bias term. However, for from quantized data. As expected from the choice of loss moderate problem sizes (< 5, 000), interior point solvers are function and regularizer, (11) is an SVM-type problem. How- more reliable [32, Ch. 10] while having worst-case complexity ever, different from existing SVMs, for which data comprises O(N 3.5). As a desirable byproduct, interior point methods N vector-valued noisy versions of {l(xn)}n=1 [26, Examples directly provide the Lagrange multipliers, which are useful 1 and 2], the estimate in (11) relies on noisy versions of for recovering the primal variables (cf. (23)). T N the scalars {φ (xn)l(xn)}n=1. Therefore, (11) constitutes a new class of SVM for vector-valued function estimation. As C. Semiparametric regression using quantized data a desirable consequence of this connection, the proposed esti- The nonparametric estimators in Secs. III-A and III-B are mate inherits the well-documented generalization performance universal in the sense that they can approximate wide classes of existing SVMs [26], [35], [11]. However, it is prudent of functions, including all continuous functions l vanishing at to highlight one additional notable difference between (11) infinity, with arbitrary accuracy provided that the number of and conventional SVMs that pertains to the present context measurements is large enough [11]. However, since measure- of function estimation from quantized data: whereas in the ments are limited in number and can furthermore be quantized, present setting  is determined by the quantization interval incorporating available prior knowledge is crucial to improve length, this parameter must be delicately tuned in conventional the accuracy of the estimates. One could therefore consider SVMs to control generalization performance. applying parametric approaches since they can readily in- The proposed estimator is nonparametric since the number corporate various forms of prior information. However, these of unknowns in (11) depends on the number of observations approaches lack the flexibility of nonparametric techniques N. Although this number of unknowns also grows with since they can only estimate functions in very limited classes. M, it is shown next that this is not the case in the dual Semiparametric alternatives offer a “sweet spot” by combining T formulation. To see this, let K0 := Φ0 KΦ0 as well as the merits of both approaches [32]. T y := [y(x1), . . . , y(xN )] . With α and β representing the This section presents semiparametric estimators capable of Langrange multipliers associated with the {ξn} and the {ζn} capturing prior information about the propagation environment constraints, the dual of (11) can be easily shown to be yet preserving the flexibility of the nonparametric estimators ˆ 1 T in Secs. III-A and III-B. To this end, an estimate of the (αˆ , β) := arg min (α − β) K0(α − β) 0 N form ˇ is pursued, where (cf. Secs. III-A α,β∈R 4Nλ l(x) = l (x) + l(x) 0 T T and III-B) the nonparametric component l (x) belongs to an − (y − 1N ) α + (y + 1N ) β RKHS H0 with kernel matrix K0; whereas the parametric s. to 0N ≤ α ≤ 1N , 0N ≤ β ≤ 1N . component is given by (12) NB From the Karush-Kuhn-Tucker (KKT) conditions, the primal X ˇl(x) = Bν (x)θν := B(x)θ (14) variables can be recovered from the dual ones using ν=1 1 with collecting user- Φ − ˆ B(x) := [B1(x),..., BNB (x)] NB cˆ = 0(αˆ β) (13a) M×M 2λN selected basis matrix functions Bν (x): R → R , ˆ 0 − ΦT − 1 (13b) , and T T T . ξ = max( N , y 0 Kcˆ  N ) ν = 1,...,NB θ := [θ1 ,..., θNB ] PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 6

M If the transmitter locations {χm}m=1 are approximately Algorithm 2: Semiparametric batch PSD map estimator known, the free space propagation loss can be described by N M 1: Input: {(xn, φ(xn), πˆQ(xn))}n=1, {φm(f)}m=1,  || − 0 0 NB matrix basis functions of the form Bm(x) = fm( x 2: Parameters: , , { } T λ K (x, x ) Bν (x) ν=1 χm||)eM,me , where fm(||x − χm||) is the attenuation M,m 3: Φ = [φ(x1),..., φ(xN )] between a transmitter located at χm and a receiver located at 4: Φ0 = IN Φ an arbitrary point x; see Sec. V for an example. Note that if 0 0 0 5: Form K , whose (n, n )-th block is K (xn, xn0 ) this basis accurately captures the propagation effects in R, 0 T 0 6: K0 = Φ0 K Φ0 then the m-th entry of the estimated θm is approximately T 7: y = [y(x1), . . . , y(xN )] proportional to the transmit power of the m-th transmitter. 8: Form B, whose (n, ν)-th block is Bν (xn) An immediate two-step approach to estimating l is to first T T 9: Obtain (αˆ , βˆ) from (18) and set θˆ := [θˆ ,..., θˆ ]T to fit the data with ˇl in (14), and then fit the residuals with 1 NB be the optimal Lagrange multiplier of the last constraint 0 back-fitting l as detailed in Sec. III-B. Since this so-termed T T T ˆ 10: [cˆ1 ,..., cˆN ] = [1/(2λN)]Φ0(αˆ − β) approach is known to yield sub-optimal estimates [32], this M 11: Output: Function ˆ P ˆ , where paper pursues a joint fit, which constitutes a novel approach Γ(x, f) = m=1 lm(x)φm(f) ˆ ˆ T PN 0 PNB ˆ in kernel-based learning for vector-valued functions. To this [l1(x),..., lM (x)] = n=1 K (x, xn)cˆn + ν=1 Bν (x)θν . end, define H as the space of functions l (not necessarily an RKHS) that can be written as l = l0 + ˇl, with l0 ∈ H0 and ˇl as in (14). One can thereby seek semiparametric estimates of primal vector of the nonparametric component can be readily the form obtained from the KKT conditions as N 1 ˆ 1 X T 0 2 cˆ0 = Φ (αˆ − βˆ) . (19) l = arg min L(y(xn)−φ (xn)l(xn))+λkl kH0 (15) 0 l∈H N 2λN n=1 Although θ can also be obtained from the KKT conditions, where the regularizer involves only the nonparametric com- 0 this approach is numerically unstable. It is preferable to obtain ponent through the norm || · || 0 in H . Using [2, Th. 3.1], H θ from the Lagrange multipliers of (18), which are known e.g. one can readily generalize the representer theorem in [32, if an interior-point solver is employed. Specifically, noting Th. 4.3] to the present semiparametric case. This yields that (17) is the dual of (18), it can be seen that θˆ equals ˆ 0 ˆ0 ˆ, where l(x) = K (x)c + B(x)θ the multipliers of the last constraint in (18). Algorithm 2 N summarizes the proposed semiparametric estimator. 0 X  T 0 0 (cˆ , θˆ) = arg min L y(xn) − φ (xn)[K (xn)c c0 θ , n=1  0T 0 0 D. Regression with conditionally positive definite kernels + B(xn)θ] + λNc K c . (16) So far, the kernels were required to be positive definite. Comparing (10) with (16), and replacing K(xn)c with This section extends the semiparametric estimator in Sec. III-C 0 0 K (xn)c +B(xn)θ yields the next result (cf. Proposition 1). to accommodate the wider class of conditionally positive definite (CPD) kernels. CPD kernels are natural for estimation Proposition 2. The problem in (16) with L the -insensitive problems that are invariant to translations in the data [32, p. loss function L can be expressed as 1 52], as occurs in spectrum cartography. Accommodating CPD N kernels also offers a generalization of thin-plate splines (TPS), 0 ˆ ˆ ˆ X 0T 0 0 (cˆ ,θ, ξ, ζ) = arg min (ξn + ζn) + λNc K c which have well-documented merits in capturing shadowing of c0,θ,ξ,ζ n=1 propagation channels [8], to operate on quantized data. T 0 0 s.to ξn ≥ y(xn) − φ (xn)[K (xn)c + B(xn)θ] − , ξn ≥ 0 Consider the following definition, which generalizes that of T 0 0 N ζn ≥ −y(xn) + φ (xn)[K (xn)c + B(xn)θ] − , ζn ≥ 0 scalar CPD kernels [32, Sec. 2.4]. Recall that, given {xn}n=1, 0 n = 1,...,N. (17) K is a matrix whose (n, n )-th block is K(xn, xn0 ). d d M×M The primal problem in (17) entails vectors of size MN, Definition 1. A kernel K(x1, x2): R × R → R is NB T which motivates solving its dual version. Upon defining B CPD with respect to {Bν (x)}ν=1 if it satisfies c Kc ≥ 0 N T for every finite set {xn} and all c such that B c = 0. as an NM × NBM matrix whose (n, ν)-th block is Bν (xn), n=1 and representing the Langrange multipliers associated with the Observe that any positive definite kernel is also CPD since {ξn} and {ζn} constraints by α and β, the dual of (17) is it satisfies cT Kc ≥ 0 for all c. To see how CPD kernels can be applied in semiparametric ˆ 1 T 0 (αˆ , β) = arg min (α − β) K0(α − β) α,β 4Nλ regression, note that (19) together with the last constraint in (18) imply that the solution to (17) satisfies BT cˆ0 = 0. − (y − 1 )T α + (y + 1 )T β N N (18) Therefore, (17) can be equivalently solved by confining the s. to 0N ≤ α ≤ 1N , 0N ≤ β ≤ 1N vectors c0 to lie in the null space of BT : BT Φ (α − β) = 0 0 0 ˆ ˆ ˆ T 0T 0 0 (cˆ ,θ, ξ, ζ) = arg min 1N (ξ + ζ) + λNc K c 0 T 0 c0,θ,ξ,ζ where K0 := Φ0 K Φ0. Except for the last constraint and the 0 T 0 0 usage of K0, (18) is identical to (12). Similar to (13a), the s.t. ξ ≥ y − Φ0 (K c + Bθ) − 1N , ξ ≥ 0N PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 7

T 0 0 0 ζ ≥ −y + Φ0 (K c + Bθ) − 1N , ζ ≥ 0N H induced by (24), which can be evaluated as in Sec. III-A, BT c0 = 0. (20) admits the equivalent form M Z The new equality constraint ensures that the objective of (20) 0 2 X 2 2 ||l ||H0 = ||∇ lm(z)||F dz 0 0 d is convex in the feasible set if K (x, x ) is CPD with respect m=1 R NB to {Bν (x)}ν=1. However, (20) is susceptible to numerical 2 0 where || · ||F denotes Frobenius norm, and ∇ the Hessian. issues because the block matrix K may not be positive 0 semidefinite. One can circumvent this difficulty by adopting Therefore, the RKHS norm of H captures the conventional notion of smoothness embedded in the magnitude of the a change of variables c0 := P ⊥c˜, where P ⊥ := I − B B MN second-order derivatives. Among other reasons, TPS are pop- B(BT B)−1BT ∈ RMN×MN is the orthogonal projector ular because they do not require parameter tuning, unlike e.g., onto the null space of BT . By doing so, (20) becomes Gaussian kernels, which need adjustment of their variance ˆ ˆ ˆ ˆ T T ⊥ 0 ⊥ parameter. The novelty here is the generalization of TPS to (c˜,θ, ξ, ζ) = arg min 1N (ξ + ζ) + λNc˜ PB K PB c˜ c˜,θ,ξ,ζ vector-valued function estimation from quantized observations. T 0 ⊥ s.t. ξ ≥ y − Φ0 (K PB c˜ + Bθ) − 1N , ξ ≥ 0N Unlike [8], which relies on un-quantized periodograms, the T 0 ⊥ proposed scheme is based on quantized power measurements. ζ ≥ −y + Φ0 (K PB c˜ + Bθ) − 1N , ζ ≥ 0N (21)

⊥ 0 ⊥ where PB K PB is guaranteed to be positive semidefinite. E. Multiple measurements per sensor A similar argument applies to the dual formulation in (18), For simplicity, it was assumed so far that each sensor where, for feasible α and β, it holds that (α − β)T K0 (α − 0 collects and reports a single measurement to the fusion center. β) ≥ 0 with K0(x, x0) CPD. To avoid numerical issues, ob- However, P > 1 measurements can be obtained per sensor by serve that any feasible α, β satisfy Φ (α−β) = P ⊥Φ (α− 0 B 0 changing the filter impulse response between measurements, β), and thus (18) can be equivalently expressed as or, by appropriately modifying the compression matrix in their 1 A2ICs; cf. Remark 3. A naive approach would be to regard (αˆ , βˆ) = arg min (α − β)T K˜ (α − β) α,β 4Nλ the P measurements per sensor as measurements from P different sensors at the same location. However, this increases − (y − 1 )T α + (y + 1 )T β N N (22) the problem size by a factor of P , and one has to deal with T s. to B Φ0(α − β) = 0 a rank deficient kernel matrix K of dimension MNP , which 0N ≤ α ≤ 1N , 0N ≤ β ≤ 1N renders the solutions of (7), (10) and (16) non-unique. A more efficient means of accommodating P measurements 0 where K0 has been replaced with the positive semidefinite per sensor in the un-quantized scenario is to reformulate (5) as ˜ T ⊥ 0 ⊥ matrix K := Φ0 PB K PB Φ0. With this reformulation, although the optimal 0 can still be recovered from (19), ˆ N P cˆ θ ˆ 1 X X T 2 l = arg min L(ˆπp(xn)−φp (xn)l(xn))+λklkH can no longer be obtained as the Lagrange multiplier µ of l∈H NP n=1 p=1 the equality constraint in (22). This is due to the change of (26) variables, which alters (22) from being the dual of (21). As where πˆp(xn) and φp(xn) correspond to the p-th mea- derived in Appendix A, θˆ can instead be recovered as surement reported by the sensor at location xn. Collect T −1 0 0 all observations in π¯ˆ := [ˆπ (x ), πˆ (x ),..., πˆ (x )] , θˆ = µ − (BT B) BT K cˆ . (23) 1 1 2 1 P N let Φ¯ := [φ1(x1), φ2(x1),..., φP (xN )], and let Φ¯ 0 := T ¯ MN×NP Broadening the scope of semiparametric regression to in- (IN ⊗ 1P ) Φ ∈ R . As before, the representer clude CPD kernels leads also to generalizations of TPS – theorem implies that the minimizer of (26) is given by ˆ PN ˆ ˆ arguably the most popular semiparametric interpolator, which l(x) = n=1 K(x, xn)c¯n := K(x)c¯. If L is the square derives its name because the TPS estimate mimics the shape loss L2 (see Sec. III-A), then of a thin metal plate that minimizes the bending energy when ˆ ¯ ¯ T −1 ¯ ˆ T c¯ = (Φ0Φ0 K + λNIMN ) Φ0π¯ (27) anchored to the data points [10]. With x := [x1, . . . , xd] , TPS adopts the parametric basis B1(x) = IM , B2(x) = which generalizes (8) to P ≥ 1. Note however that c¯ has x1IM ,..., B1+d(x) = xdIM , and the diagonal matrix kernel dimension MN, whereas the aforementioned naive approach would result in MNP . Likewise, K is of size MN × MN. K0(x , x ) = r(||x − x ||2)I (24) 1 2 1 2 2 M If πˆp(xn) is replaced with yp(xn) in (26), the resulting expression extends (9) to multiple measurements per sensor. If where r(z) denotes the radial basis function L is the -insensitive cost L1, such an expression is minimized ( ˆ ˆ z2s−d log(z) if d is even for l(x) = K(x)c¯ with r(z) := (25) z2s−d otherwise ¯ˆ ¯ˆ T ¯ ¯ T (c¯ˆ, ξ, ζ) = arg min 1NP (ξ + ζ) + λNP c¯ Kc¯ c¯,ξ¯,ζ¯ for a positive integer typically set to [39, eq. s s = 2 s. to ξ¯ ≥ y¯ − Φ¯ T Kc¯ − 1 , ξ¯ ≥ 0 (2.4.9)],[8]. The kernel in (24) can be shown to be CPD with 0 NP NP ¯ ¯ T ¯ 1+d ζ ≥ −y¯ + Φ0 Kc¯ − 1NP , ζ ≥ 0NP (28) respect to {Bν (x)}ν=1 [39, p. 32]. The norm in the RKHS PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 8

T where y¯ := [y1(x1), y2(x1), . . . , yP (xN )] . Note that (28) solving the dual of (28) through interior point methods takes reduces to (11) if P = 1. The dual formulation is the same as O((NP )3.5) iterations. A similar level of complexity is in- (12), except that α, β, Φ0, and N are replaced with α¯ , β¯, Φ¯ 0, curred by its semiparametric counterpart. Albeit polynomial, and NP , respectively. The primal solution can be recovered this complexity may be prohibitive in real-time applications −1 ˆ as c¯ˆ = (2λNP ) Φ¯ 0(α¯ˆ − β¯); cf. (13a). with limited computational capabilities if the number of The multi-measurement version of the semiparametric esti- measurements NP is large. For such scenarios, an online mator in (15) is algorithm with linear complexity is proposed in Sec. IV, which N P is guaranteed to converge to the nonparametric estimate (10). ˆ 1 X X T 0 2 An online algorithm for semiparametric estimation can be l = arg min L(yp(xn)−φp (xn)l(xn))+λkl kH0 l∈H NP n=1 p=1 found in [30]. (29) and the counterpart to (17) is obtained by replacing ξ, ζ, y, φ(xn) and N, with ξ¯, ζ¯, y¯, φp(xn) and NP , respectively. Remark 8. In real applications, low SNR conditions, trans- Likewise, the dual formulation is obtained by substituting α, mission , and the hidden terminal problem may β, Φ0, N, and K˜ in (22) with α¯ , β¯, Φ¯ 0, NP , and limit the quality of PSD map estimates relying on short obser- vation windows. Thus, highly accurate estimates may require K¯˜ := Φ¯ T P ⊥K0P ⊥Φ¯ (30) 0 B B 0 longer observation intervals to average out the undesirable respectively. The primal variables are recovered again as effects of noise and small-scale fading. This constitutes a −1 ˆ T −1 T 0 c¯ˆ = (2λNP ) Φ¯ 0(α¯ˆ − β¯) and θˆ = µ¯ − (B B) B K c¯ˆ, tradeoff between estimation accuracy and temporal resolution where µ¯ is the Lagrange multiplier vector associated with the of PSD maps that is inherent to the spectrum cartography equality constraints in the dual problem; cf. (23). problem. This tradeoff may not pose a difficulty in applications such as in TV networks, where transmitters remain active or Remark 5 (Non-uniform quantization). Unless is uni- πˆ(xn) inactive for long time intervals, typically months or years, and formly distributed, non-uniform quantization may be prefer- therefore a high temporal resolution is unnecessary. In other able over the uniform quantization adopted so far. With scenarios, it may suffice to know whether the primary users are quantization regions specified by the boundaries R 0 = active or inactive, in which case a high estimation accuracy is , the quantized measurements are τ0 < τ1 < . . . < τR not needed, and therefore short observation windows may be for ∈ . The general πQ(xn) := Q(ˆπ(xn)) = i πˆ(xn) [τi, τi+1) enough. However, in those scenarios where both high accuracy formulations (26) and (29) can accommodate non-uniformly and fine temporal resolution are required, one would need to quantized observations by replacing in all relevant opti-  deploy more sensors. The present paper alleviates the negative mization problems with (x ) := (τ − τ )/2; n πQ(xn)+1 πQ(xn) impact of the aforementioned tradeoff by reducing the required and likewise modifying the centroid expression from y(xn) := communication bandwidth. This brings a threefold benefit: [2π (x ) + 1] to y(x ) := (τ + τ )/2. Q n n πQ(xn)+1 πQ(xn) (i) in a fixed time interval, each sensor may report more measurements to the fusion center, thus improving averaging Remark 6 (Enforcing nonnegativity). Since all M entries of and therefore the accuracy of the estimates; (ii), a larger vector l(x) represent power, they are inherently nonnegative. number of sensors can be deployed; and (iii), the latency of the To exploit this information, at least partially, one can enforce communication between sensors and fusion center is reduced. non-negativity of l(x) at all sensor locations by introducing the constraint K(xn)c ≥ 0M for n = 1,...,N in (11). Another approach is to include M “virtual measurements” IV. ONLINE ALGORITHM for every sensor location x to promote estimates l(x) n The algorithms proposed so far operate in batch mode, satisfying 0 ≤ l (x ) < τ for all m. In this way, the m n R thus requiring all observations before they can commence. estimation algorithm no longer uses the set of “real” mea- Moreover, their computational complexity increases faster than surements {y (x ), φ (x ),  (x )}P for every sensor p n p n p n p=1 linearly in NP , which may be prohibitive if the number of n = 1,...,N, where  (x ) is the quantization interval of p n measurements is large relative to the available computational the p-th measurement obtained by the n-th sensor. Instead, it resources. These considerations motivate the development of uses its extended version {y (x ), φ (x ),  (x )}P +M , p n p n p n p=1 online algorithms, which can both approximate the solution where φ (x ) = e , y (x ) := (τ + τ )/2, P +m n M,m P +m n 0 R of the batch problem with complexity O(NP ) and update  (x ) := (τ − τ )/2 for m = 1,...,M. The mea- P +m n R 0 ˆl(x) as new measurements arrive at the fusion center. Online surements p = P + 1,...,P + M are termed “virtual” since algorithms are further motivated by their ability to track a they are appended to the “real” measurements by the fusion time-varying l. center, but are not acquired by the sensors. Note that this Although online algorithms can be easily constructed by it- approach does not constrain l to be entry-wise non-negative eratively applying batch algorithms over sliding windows [33], at the sensor locations, it just promotes estimates satisfying online strategies with instantaneous updates are preferred [14]. this condition. An elegant approach for kernel-based learning relying on Remark 7 (Computational complexity). With un-quantized stochastic gradient descent (SGD) in the RKHS is developed data, the estimate (27) requires O(M 3N 3 + PM 2N) oper- in [24] for scalar kernel machines. Its counterpart for vector- ations. With nonparametric estimation from quantized data, valued functions is described in [4], but it is not directly PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 9

(t) applicable to the present setup since it requires differentiable amplitudes of the entries in ci are shrunk by a factor objectives and vectors φ(xν ) that do not depend on ν. This |1 − 2µtλ| < 1. This justifies truncating (36) as section extends the algorithm in [4] to accommodate the t−1 present scenario. (t) X (t) l (x) = K(x, xi)ci (37) For a selected L, consider the instantaneous regularized i=max(1,t−I) cost, defined for generic l ∈ H, x , φ(x ), and y(x ) as ν ν ν for some I > 1. On the other hand, if the fusion center T 2 processes multiple observations per sensor, {K(x, x )}t C(l, φ(xν ), xν , y(xν )) := L(y(xν ) − φ (xν )l(xν )) + λklkH. ni i=1 (31) are no longer linearly independent. In such a case, up to N N kernels {K(x, xn)}n=1 are linearly independent, which yields Note that (5) indeed minimizes the sample average of C. N Suppose that at time index t = 1, 2,... the fusion center ( ) X ( ) l t (x) = K(x, x )c t . (38) processes one measurement from sensor n ∈ {1,...,N}. n n t n=1 If the fusion center uses multiple observations, say P , from 0 0 After receiving the observation (x , y(x ), φ(x )) at time the n -th sensor, then n = n = ... = n = n , where nt nt nt t1 t2 tP (t+1) N t, it follows from (38) that one must obtain {cn } as {t1, . . . , tP } depends on the fusion center schedule. n=1 Upon processing the t-th measurement, the SGD update is  (t) (1 − 2µtλ)cn if n =6 nt (t+1)  (t) (t+1) (t) (t) 0 cn = (1 − 2µtλ)cnt + µtL y(xnt ) l (x) = l (x) − µt∂lC(l , φ(xnt ), xnt , y(xnt ))(x)  T (t)  (32) −φ (xnt )l (xnt ) φ(xnt ) if n = nt. Convergence of these recursions is characterized by the next where l(t) is the estimate at time t before y(x ) has been nt result, which adapts [4, Thm. 1] to the proposed setup. received; µt > 0 is the learning rate; and ∂l denotes subgra- 2 dient with respect to l. In general, µt can be replaced with a Theorem 1. If λmax(K(x, x)) < λ¯ < ∞ for all x, C −1/2 matrix Mt to increase flexibility. For as in (31), it can be ||φ(xnt )||2 ≤ ϕ¯ for all t, and µt := µt with µλ < 1, shown using [4, eq. (2)] that then the iterates in (35) with L = L1 satisfy T ∂lC(l, φ(x ), x , y(x ))(x) (33) 1 X ν ν ν C(l(t), φ(x ), x , y(x )) (39) 0 T  T nt nt nt = −L y(xν ) − φ (xν )l(xν ) K(x, xν )φ(xν ) + 2λl(x) t=1 " T # 0 1 a1 a2 and L is a subgradient of L, which for L1(eν ) := X ≤ inf C(l, φ(xn ), xn , y(xn )) + √ + l∈H T t t t T max(0, |eν | − ) is e.g.: t=1 T 2 2 2 2 2 sgn(e − ) + sgn(e + ) where a2 := λ¯ ϕ¯ /(8λ µ) and a1 := 4(λ¯ ϕ¯ µ + a2). L0 (e ) = ν ν . (34) 1 ν 2 Proof: See Appendix C.  Substituting (33) into (32) yields In words, Theorem 1 establishes that the averaged instan- (t+1) (t) l (x) = (1 − 2µtλ)l (x) (35) taneous error from the online algorithm converges to the 0 T (t)  regularized empirical error of the batch solution. + µtL y(xnt ) − φ (xnt )l (xnt ) K(x, xnt )φ(xnt ).

Upon setting l(1) = 0, it follows that V. NUMERICAL TESTS In this section, the proposed algorithms are validated t−1 (t) X (t) through numerical experiments. Following [17], a corre- l (x) = K(x, xn )c (36) i i lated shadow fading model was adopted, where, for m = i=1 1,...,M − 1, for some c(t), i = 1, . . . , t − 1. Interestingly, although the i 10 log l (x) = 10 log A − γ log(δ + kx − χ k) + s (x). representer theorem [26, Thm. 5] has not been invoked, the m m m m (40) estimates l(t)(x) here have the form of those in Sec. III. If measurements come from sensors at different locations, Here, δ > 0 is a small constant ensuring that the argument of the logarithm does not vanish, γ = 3 is the pathloss exponent, the functions K(x, xni ), i = 1, . . . , t, are linearly independent for properly selected kernels, and substituting (36) into (35) and the parameters Am and χm denote the power and location results in the following update rule: of the m-th transmitter, respectively. The random shadowing component sm(x) is generated as a zero-mean Gaussian ( (t) 0 2 −||x−x0|| (t+1) (1 − 2µtλ)ci if i = 1, . . . , t − 1 random variable with E {sm(x)sm(x )} = σs ρ , c = 2 i 0 T (t) where σ = 2 and ρ = 0.8. The noise power was set to µtL [y(xnt ) − φ (xnt )l (xnt )]φ(xnt ) if i = t. s lM (x) = 0.75. This equation reveals that the number of coefficients main- The N sensors, deployed uniformly at random over tained increases linearly in t. This is the so-called curse the region of interest, report P quantized measurements T of kernelization [41]. However, if µtλ ∈ (0, 1), then the {πˆQ,p(xn) = Q(|πp(xn) + ηp(xn)|) = Q(|φp (xn)l(xn) + PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 10

−9 Fig. 2: True and estimated functions lm(x) with N = 50, P = 8, 5-bit quantization with CPQ, λ = 10 , and nonnegativity enforced through virtual sensors. Each column corresponds to the power of a channel. White crosses denote sensor locations. The PSD at any location can be reconstructed by substituting the values of these functions into (1).

P 2 ηp(xn)|)}p=1 to the fusion center, where ηp(xn) ∼ N (0, ση) well with the true ones for all transmitters. simulates noise due to finite sample estimation effects. The To quantify the estimation performance, the normalized entries of φp(xn) were generated uniformly over the interval mean-square error (NMSE), defined as [0, 1] for all xn and p. n o kl(x) − ˆl(x)k2 Two quantization schemes were implemented. Under uni- E 2 NMSE := 2 (41) form quantization (UQ), the range of πp(x) was first deter- E {kl(x)k2} mined using Monte Carlo runs and the quantization region was employed, where the expectation is taken with respect to boundaries τ0 < τ1 < . . . < τR were set such that τi+1 −τi = the uniformly distributed x, measurement vectors {φp(xn)}, 2 ∀i, where  was such that the probability of clipping and measurement noise . To focus on quantization −3 b ηp(xn) Pr{πp(x) > τR} was approximately 10 and R := 2 , effects, the region of interest was the one-dimensional (d = 1) with b the number of bits per measurement. Under constant 1 interval [0, 1] ⊂ R , and M − 1 = 4 transmitters with probability quantization (CPQ), these boundaries were chosen parameters χ1 = 0.1, χ2 = 0.2, χ3 = 0.4, χ4 = 0.8,A1 = such that Pr{τi ≤ πp(x) < τi+1} was approximately constant 0.8,A2 = 0.9,A3 = 0.8 and A4 = 0.7, were considered. for all i. To illustrate the effects of quantization, Fig. 3 depicts the The true PSD map in the region [0, 1] × [0, 1] ⊂ R2 (d = 2) NMSE as a function of N for different values of b using both was created with M −1 = 3 transmitters, A1 = 0.9, A2 = 0.8, nonparametric and semiparametric estimators. To capture only A3 = 0.7, χ1 = (0.2, 0.8), χ2 = (0.4, 0.5), and χ3 = quantization effects, no measurement errors were introduced 2 2 (0.8, 0.9). Using CPQ and enforcing nonnegativity through (ση = 0), uniform quantization was used, and σs was set to M virtual measurements per sensor, the batch semiparametric zero. The nonparametric approach used a diagonal Gaussian estimate from (29) was computed. The nonparametric part kernel (GK) as in Fig. 2. It can be seen that the proposed adopts a diagonal Gaussian kernel matrix K(xn, xn0 ) with estimators are consistent in N. Although, for this particular 2 2 km(xn, xn0 ) = exp(−kxn − xn0 k /σm) on its m-th diagonal case, TPS mostly outperforms GK-based regression, this need 2 entry (cf. Remark 4), where σm = 0.12 for m = 1, 2, 3, 4. not hold in different scenarios since which kernel leads to the The parametric part is spanned by a basis with NB = 1 and best performance depends on the propagation environment as B1(x) a diagonal matrix whose (m, m)-th entry is given by well as on the field characteristics. γ 1/(δ+kx−χmk ) if m = 1,...,M −1; and 0 if m = M. The Fig. 4 depicts the NMSE for N = 40 sensors with a 2 variance ση was set such that about 15% of the measurements varying number of measurements P per sensor in different contain errors. Fig. 2 shows the true and estimated maps for a settings. As expected, the estimates are seen to be inconsistent particular realization of sensor locations {xn} (represented by as P grows since the number of sensors is fixed and there is crosses), measurement vectors {φp(xn)}, and measurement no way to accurately estimate the PSD map far from their noise ηp(xn). Each sensor reports P = 8 measurements vicinity. It is also seen that TPS regression benefits more quantized to b = 5 bits. Although every sensor transmits only from incorporating non-negativity constraints than the GK- 5 bytes, it is observed that the reconstructed PSD maps match based schemes. Finally, it is observed that CPQ outperforms PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 11

0.8 1.4 1 bits/meas., GK UQ, 2 bits/meas. 0.7 1 bits/meas., TPS 1.2 CPQ, 2 bits/meas. 0.6 2 bits/meas., GK UQ, 4 bits/meas. 1 2 bits/meas., TPS CPQ, 4 bits/meas. 0.5 4 bits/meas., GK 0.8 UQ, 6 bits/meas. 0.4 4 bits/meas., TPS CPQ, 6 bits/meas. NMSE NMSE 0.6 0.3 0.4 0.2

0.1 0.2

0 0 10 20 30 40 50 60 70 80 90 100 0 1 2 3 4 5 6 7 8 2 σ (meas. variance) −3 N (sensors) ζ x 10 Fig. 3: NMSE versus N for variable UQ resolution with M = Fig. 5: Measurement noise effects (M = 5, N = 40, P = 5, 2 −6 −6 5, P = 5, ση = 0, λ = 10 , and nonnegativity not enforced. λ = 10 , GK, and nonnegativity not enforced).

UQ, non−neg. = 0, GK 0.4 CPQ, non−neg. = 0, GK offline 0.3 online κ = 0.11 UQ, non−neg. = 1, GK t online κ = 0.05 CPQ, non−neg. = 1, GK 0.2 t online κ = 0.025 0.1 UQ, non−neg. = 0, TPS t Reg. Risk CPQ, non−neg. = 0, TPS 0.1 UQ, non−neg. = 1, TPS 0 0 5 10 15 20 25 30 35 40 45 50 NMSE CPQ, non−neg. = 1, TPS Time Slot 0.05 0.8 offline online κ = 0.11 0.6 t online κ = 0.05 0.4 t κ

NMSE online = 0.025 0 t 5 10 15 20 0.2 P (measurements per sensor) 0 0 5 10 15 20 25 30 35 40 45 50 Fig. 4: Performance of different quantizers with and with- Time Slot out nonnegativity constraints (M = 5, N = 40, b = 2 bits/measurement, 2 , and −6). Fig. 6: Performance of the online algorithm (M = 5, N = ση = 0 λ = 10 2 −6 30, b = 4 bits/measurement, ση = 0, λ = 10 , GK, and nonnegativity not enforced). UQ. However, as demonstrated by Fig. 5, UQ leads to a better performance than CPQ for this simulation scenario if the noise 2 different degrees of prior information without sacrificing flexi- variance ση is sufficiently large. Fig. 5 further shows that the effect of measurement noise is more pronounced for larger bility, nonparametric and semiparametric estimators have been b. This is intuitive since a given measurement noise variance proposed by leveraging the framework of kernel-based learn- leads to more measurement error events. Thus, πp(xn) must ing. Existing semiparametric regression techniques have been be estimated more accurately under finer quantization. generalized to vector-valued function estimation and shown Fig. 6 depicts the performance of the online algorithm using to subsume thin-plate spline regression as a special case. the representation in (38). As a benchmark, the offline (batch) Extensions to multiple measurements per sensor, non-uniform algorithm was run per time slot with all the data received up to quantization and non-negativity constraints have also been that time slot. The top panel in Fig. 6 shows the regularized introduced. Both batch and online approaches were developed, empirical risks (evaluated per time slot using the entire set thereby offering a performance complexity trade-off. of observations) for different learning rates µt. Common to Future work will be devoted to kernel selection [16], gradient methods with constant step size, a larger µt speeds up quantizer design, and alternative types of spectral cartography convergence, but also increases the residual error. The bottom formats including construction of channel-gain maps. panel depicts the NMSE evolution over time. Using NMSE as figure of merit favors greater learning rates over smaller ones. APPENDIX A DERIVATION OF (23) VI.CONCLUSIONS This paper introduced a family of methods for nonpara- This appendix describes how to obtain c˜ˆ and θˆ in (21) from metric and semiparametric estimation of PSD maps using (αˆ , βˆ) in (22). Because of the change of variable, (22) is a set of linearly compressed and possibly quantized power not the dual of (21); as a result c˜ˆ and θˆ are not, in general, measurements collected by distributed sensors. To capture the Lagrange multipliers of (22). As shown next, they can be PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 12 recovered after relating (22) to the dual of (21). The latter is: Thus, upon defining Kˇ ∈ RN×N with (n, n0)-entry equal N×N to 0 , and ˇ ∈ B with -entry equal to ˇ ˇ T ⊥ 0 ⊥ K(xn, xn ) B R (n, ν) (c˜, αˇ ,β) = arg min λNc˜ PB K PB c˜ c˜,α,β Bν (xn), matrices K and B can be written as T T − (y − 1N ) α + (y + 1N ) β K = Kˇ ⊗ IM (49) ⊥ 0 ⊥ ⊥ 0 s. to 2λNPB K PB c˜ − PB K Φ0(α − β) = 0MN B = Bˇ ⊗ IM . (50) T Φ − 0 B 0(α β) = MNB Then, the computation of certain matrices involved in the α − 1N ≤ 0N , −α ≤ 0N , β − 1N ≤ 0N , −β ≤ 0N . proposed algorithms can be done efficiently as described next. (42) Start by constructing a selection matrix S containing ones and zeros such that A B = (A ⊗ B)S, and define Φ¯ 0 := The presence of c˜ in both (21) and (42) is owing to the fact that T ¯ ⊥ 0 ⊥ (IN ⊗ 1P ) Φ (c.f. Sec. III-E) and (49) to obtain PB K PB is not invertible. Taking into account that (21) is ˆ ¯ T T ¯ T ˇ indeed the dual of (42), it can be shown that θ is the Lagrange Φ0 K = [(IN ⊗ 1P ) Φ] (K ⊗ IM ) multiplier µˇ 2 associated with the second equality constraint of T ¯ T ˇ ˆ ˇ = S [(IN ⊗ 1P ) ⊗ Φ ](K ⊗ IM ) (42), whereas c˜ = c˜. Thus, it remains only to express µˇ 2 and T ˇ ¯ T c˜ˇ in terms of the solution to (22). = S [(IN ⊗ 1P )K ⊗ Φ ] (51) From the second equality constraint in (42), it holds for T T = S [(Kˇ ⊗ 1P ) ⊗ Φ¯ ] (α, β) feasible that Φ (α − β) = P ⊥Φ (α − β). Then, the 0 B 0 ˇ ⊗ 1T Φ¯ T first equality constraint in (42) can be replaced with = [(K P ) ] .

⊥ 0 ⊥ ⊥ 0 ⊥ Likewise, one can verify that 2λNPB K PB c˜ − PB K PB Φ0(α − β) = 0MN . (43) ¯ T ˇ T T ¯ T Φ0 B = [(B ⊗ 1P ) Φ] . (52) Solving (43) for c˜ produces ⊥ Using the Kronecker product properties, PB can be written 1 ⊥ 0 ⊥ c˜ = Φ0(α − β) + ν(P K P ) (44) as 2λN B B ⊥ T −1 T P = IMN − (Bˇ (Bˇ Bˇ ) Bˇ ) ⊗ IM where ν(A) denotes any vector in the null space of A. B (53) ˇ ˇ T ˇ −1 ˇ T ⊥ Substituting (44) into (42), one recovers (22). = (IN − B(B B) B ) ⊗ IM = PBˇ ⊗ IM Clearly, problems (42) and (22) are equivalent in the sense T T ⊥ − ˇ ˇ ˇ −1 ˇ that if (αˆ , βˆ) solves the latter, then where PBˇ := IN B(B B) B . Also, from (30) ¯˜ ¯ T ⊥ ˇ ⊥ ¯ 1 K = Φ0 (P ˇ ⊗ IM )(K ⊗ IM )(P ˇ ⊗ IM )Φ0 c˜ˇ = Φ0(αˆ − βˆ), αˇ = αˆ , βˇ = βˆ (45) B B 2λN ¯ T ⊥ ˇ ⊥ ¯ = Φ0 (PBˇ KPBˇ ⊗ IM )Φ0 solve the former. However, their Lagrange multipliers differ = ((I ⊗ 1T ) Φ¯ )T (P ⊥KPˇ ⊥ ⊗ I ) · ((I ⊗ 1T ) Φ¯ ) due to the transformation introduced. A possible means of N P Bˇ Bˇ M N P T T ¯ T ⊥ ˇ ⊥ establishing their relation is to compare the KKT conditions = S ((IN ⊗ 1P ) ⊗ Φ) (PBˇ KPBˇ ⊗ IM ) T ¯ of both problems. In particular, let µˆ 2, νˆ1, νˆ2, νˆ3, νˆ4 denote · ((IN ⊗ 1P ) ⊗ Φ)S the Lagrange multipliers corresponding to (αˆ , βˆ), associated T = ST [(I ⊗ 1 )P ⊥KPˇ ⊥(I ⊗ 1T ) ⊗ Φ¯ Φ¯ ]S with the constraints of (22) in the same order listed here; then N P Bˇ Bˇ N P ⊥ ˇ ⊥ T ¯ T ¯ the multipliers = (IN ⊗ 1P )PBˇ KPBˇ (IN ⊗ 1P ) ◦ Φ Φ ⊥ ⊥ T T ˇ T −1 T 0ˇ = (P ˇ KPˇ ˇ ⊗ 1P 1 ) ◦ Φ¯ Φ¯ . µˇ 1 = −c˜, µˇ 2 = µˆ 2 − (B B) B K c˜, B B P (46) (54) νˇ1 = νˆ1, νˇ2 = νˆ2, νˇ3 = νˆ3, νˇ4 = νˆ4 Finally, θ can be obtained as (cf. Sec. III-E): correspond to the solution of (42) given by (45). Recalling ˆ T −1 T that θ = µˇ 2 one readily arrives at (23). θ = µ¯ − [(Bˇ Bˇ ) Bˇ Kˇ ⊗ IM ]c¯ˆ. (55)

APPENDIX B APPENDIX C EFFICIENT IMPLEMENTATION OF ASPECIAL CASE PROOFOF THEOREM 1 In practice, one may effectively ignore the dependencies The following proof extends that of [4, Thm. 1] which cannot accommodate the instantaneous cost from (31) when between the entries lm(x), m = 1,...,M, by using a 0 L = L1 since L1 is not differentiable and φ(xν ) is not diagonal kernel K(x, x ) and diagonal basis functions Bν (x). Furthermore, one may be interested in modeling all entries constant over ν. Expanding the norms into inner products identically, as in TPS regression. Then, both the kernel and and employing (32), one finds that basis functions become scaled identity matrices; that is, for ||l(t) − g||2 − ||l(t+1) − g||2 0 H H certain scalar functions K(x, x ) and Bν (x), one has (t+1) (t) 2 (t+1) (t) (t) 2 = −||l − l ||H − 2hl − l , l − giH 0 0 K(x, x ) := K(x, x )IM (47) 2 (t) 2 = −µt ||∂lC(l , φ(xnt ), xnt , y(xnt ))||H (56) Bν (x) := Bν (x)IM , ν = 1,...,NB. (48) (t) (t) + 2hµt∂lC(l , φ(xnt ), xnt , y(xnt )), l − giH. PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 13

To bound the norm in (56), apply the triangle inequality to [3] D. D. Ariananda, D. Romero, and G. Leus, “Cooperative compressive (33) to obtain: power spectrum estimation,” in Proc. of IEEE Sensor Array Multichan- nel Sig. Process. Workshop, A Corunha, Spain, Jun. 2014, pp. 97–100. ||∂ C(l, φ(x ), x , y(x ))|| [4] J. Audiffren and H. Kadri, “Online learning with multiple operator- l ν ν ν H valued kernels,” Preprint at http://arXiv.org/abs/1311.0222, 2013. 0 T ≤ |L1(y(xν ) − φ (xν )l(xν ))| ||K(·, xν )φ(xν )||H + 2λ||l||H.[5] E. Axell, G. Leus, E. G. Larsson, and H. V. Poor, “Spectrum sensing for cognitive radio: State-of-the-art and recent advances,” IEEE Sig. Process. 0 Mag., vol. 29, no. 3, pp. 101–116, May 2012. Since L1 is 1-Lipschitz, L1(e) ≤ 1 ∀e. Moreover, the reproducing property (cf. Sec. III-A) implies that [6] J.-A. Bazerque and G. B. Giannakis, “Distributed spectrum sensing for cognitive radio networks by exploiting sparsity,” IEEE Trans. Sig. T 1/2 ||K(·, xν )φ(xν )||H = (φ (xν )K(xν , xν )φ(xν )) < Process., vol. 58, no. 3, pp. 1847–1862, Mar. 2010. λ¯||φ(xν )||, and hence [7] ——, “Nonparametric basis pursuit via kernel-based learning,” IEEE Sig. Process. Mag., vol. 28, no. 30, pp. 112–125, 2013. ||∂lC(l, φ(xν ), xν , y(xν ))||H ≤ λ¯||φ(xν )|| + 2λ||l||H. (57) [8] J.-A. Bazerque, G. Mateos, and G. B. Giannakis, “Group-lasso on splines for spectrum cartography,” IEEE Trans. Sig. Process., vol. 59, no. 10, Similarly, from (35) and the triangular inequality, one finds pp. 4648–4663, Oct. 2011. [9] A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces that in Probability and Statistics. Springer, 2004. (t+1) (t) ¯ [10] F. L. Bookstein, “Principal warps: Thin-plate splines and the decompo- ||l ||H ≤ (1 − 2µtλ)||l ||H + µtλ||φ(xν )||. sition of deformations,” IEEE Trans. Pattern Anal. Mach. Intel., vol. 11, no. 6, pp. 567–585, 1989. (1) Recalling that l = 0, it is simple to show by induction that [11] C. Carmeli, E. De Vito, A. Toigo, and V. Umanita, “Vector valued (t) ¯ reproducing kernel Hilbert spaces and universality,” Analysis Appl., ||l ||H ≤ U := λϕ/¯ 2λ for all t. This fact and ||φ(xnt )||2 ≤ ϕ¯ applied to (57) produce vol. 8, no. 1, pp. 19–61, 2010. [12] E. Dall’Anese, S.-J. Kim, G. B. Giannakis, and S. Pupolin, “Power ¯ ¯ control for cognitive radio networks under channel uncertainty,” IEEE ||∂lC(l, φ(xν ), xν , y(xν ))||H ≤ λϕ¯ + 2λU = 2λϕ.¯ (58) Trans. Wireless Commun., vol. 10, no. 10, pp. 3541–3551, 2011. [13] E. Dall’Anese, J.-A. Bazerque, and G. B. Giannakis, “Group sparse for all l. On the other hand, the last term in (56) can be lasso for cognitive network sensing robust to model uncertainties and bounded by invoking the definition of subgradient: outliers,” Phy. Commun., vol. 5, no. 2, pp. 161–172, 2012. [14] T. Diethe and M. Girolami, “Online learning with (multiple) kernels: A (t) (t) review,” Neural Comput., vol. 25, no. 3, pp. 567–625, Mar. 2013. h∂lC(l , φ(xnt ), xnt , y(xnt )), l − giH (59) [15] G. Ding, J. Wang, Q. Wu, Y. Yao, F. Song, and T. A. Tsiftsis, (t) ≥ C(l , φ(xnt ), xnt , y(xnt )) − C(g, φ(xnt ), xnt , y(xnt )). “Cellular-base-station-assisted device-to-device communications in TV white space,” IEEE J. Sel. Areas Commun., vol. 34, no. 1, pp. 107–121, Combining (58) and (59) with (56) results in Jan. 2016. [16]M.G onen¨ and E. Alpaydın, “Multiple kernel learning algorithms,” J. (t) 2 (t+1) 2 ||l − g||H − ||l − g||H Mach. Learn. Res., vol. 12, pp. 2211–2268, 2011. 2 ¯2 2  [17] M. Gudmundson, “Correlation model for shadow fading in mobile radio ≥ −4µt λ ϕ¯ − 2µt C(g, φ(xnt ), xnt , y(xnt )) systems,” Electron. Letters, vol. 27, no. 23, pp. 2145–2146, 1991. − C(l(t), φ(x ), x , y(x ). [18] D.-H. Huang, S.-H. Wu, W.-R. Wu, and P.-H. Wang, “Cooperative radio nt nt nt source positioning and power map reconstruction: A sparse Bayesian learning approach,” IEEE Trans. Veh. Technol., vol. 64, no. 6, pp. 2318– Adapting the proof of [4, Prop. 3.1(iii)], it can be shown that 2332, Jun. 2015. if g ∈ H equals the value of l attaining the infimum on the [19] B. A. Jayawickrama, E. Dutkiewicz, I. Oppermann, G. Fang, and J. Ding, right hand side of (39), then ||g||H ≤ U. For such a g one “Improved performance of spectrum cartography based on compressive sensing in cognitive radio networks,” in Proc. Int. Conf. Commun., has Budapest, Hungary, Jun. 2013, pp. 5657–5661. 1 (t) 2 1 (t+1) 2 [20] V. Kecman, M. Vogt, and T. M. Huang, “On the equality of kernel ||l − g||H − ||l − g||H (60) AdaTron and sequential minimal optimization in classification and µt µt+1 regression tasks and alike algorithms for kernel machines.” in Proc. 1 h (t) 2 (t+1) 2 i European Symp. Artificial Neural Netw., Bruges, Belgium, Apr. 2003, = ||l − g||H − ||l − g||H (61) pp. 215–222. µt [21] S.-J. Kim, E. Dall’Anese, and G. B. Giannakis, “Cooperative spectrum  1 1  − − ||l(t+1) − g||2 (62) sensing for cognitive using kriged Kalman filtering,” IEEE J. Sel. H Topics Sig. Process. µt+1 µt , vol. 5, no. 1, pp. 24–36, Feb 2011. ¯2 2  [22] S.-J. Kim and G. B. Giannakis, “Cognitive prediction ≥ −4µtλ ϕ¯ − 2 C(g, φ(xnt ), xnt , y(xnt )) using dictionary learning,” in Proc. IEEE Global Commun. Conf.,   Atlanta, GA, Dec. 2013. (t)  1 1 2 − C(l , φ(xnt ), xnt , y(xnt ) + − 4U (63) [23] S.-J. Kim, N. Jain, G. B. Giannakis, and P. Forero, “Joint link learning µt µt+1 and cognitive radio sensing,” in Proc. Asilomar Conf. Sig., Syst., Comput., Pacific Grove, CA, Nov. 2011. (t+1) since ||l − g||H ≤√2U. Summing for t =√ 1,...,T and [24] J. Kivinen, A. J. Smola, and R. C. Williamson, “Online learning with PT kernels,” IEEE Trans. Sig. Process., vol. 52, no. 8, pp. 2165–2176, 2004. applying√ t=1 µt ≤ 2µ T [24] together with T + 1 − 1 ≤ T the result in (39) follows readily. [25] O. Mehanna and N. Sidiropoulos, “Frugal sensing: Wideband power spectrum sensing from few bits,” IEEE Trans. Sig. Process., vol. 61, no. 10, pp. 2693–2703, May 2013. REFERENCES [26] C. A. Micchelli and M. Pontil, “On learning vector-valued functions,” Neural Comput., vol. 17, no. 1, pp. 177–204, 2005. [1] A. Alaya-Feki, S. B. Jemaa, B. Sayrac, P. Houze, and E. Moulines, [27] J. C. Platt, “Fast training of support vector machines using sequential “Informed spectrum usage in cognitive radio networks: Interference minimal optimization,” in Advances in Kernel Methods. Cambridge, cartography,” in Proc. IEEE Int. Symp. Personal, Indoor Mobile Radio MA: MIT Press, 1999, pp. 185–208. Commun., 2008, pp. 1–5. [28] Z. Quan, S. Cui, H. Poor, and A. Sayed, “Collaborative wideband [2] A. Argyriou and F. Dinuzzo, “A unifying view of representer theorems,” sensing for cognitive radios,” IEEE Sig. Process. Mag., vol. 25, no. 6, in Proc. Int. Conf. Mach. Learning, Beijing, China, Jun. 2014. pp. 60–73, 2008. PUBLISHED IN IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017 14

[29] A. Ribeiro and G. B. Giannakis, “Bandwidth-constrained distributed Seung-Jun Kim (SM’12) received his B.S. and M.S. estimation for wireless sensor networks: Part I/II,” IEEE Trans. Sig. degrees from Seoul National University in Seoul, Process., vol. 54, no. 3/7, pp. 1131–1143/2784–2796, 2006. Korea in 1996 and 1998, respectively, and his Ph.D. [30] D. Romero, S.-J. Kim, and G. B. Giannakis, “Stochastic semiparametric from the University of California at Santa Barbara regression for spectrum cartography,” in Proc. IEEE Int. Workshop in 2005, all in electrical engineering. From 2005 to Comput. Advances Multi-Sensor Adaptive Process., Cancun, Mexico, 2008, he worked for NEC Laboratories America in Dec. 2015, pp. 513–516. Princeton, New Jersey, as a Research Staff Member. [31] D. Romero and G. Leus, “Wideband spectrum sensing from compressed He was with Digital Technology Center and Depart- measurements using spectral prior information,” IEEE Trans. Sig. Pro- ment of Electrical and Computer Engineering at the cess., vol. 61, no. 24, pp. 6232–6246, 2013. University of Minnesota during 2008-2014, where [32] B. Scholkopf¨ and A. J. Smola, Learning with Kernels: Support Vector his final title was Research Associate Professor. In Machines, Regularization, Optimization, and Beyond. MIT Press, 2002. August 2014, he joined Department of Computer Science and Electrical [33] D. J. Sebald and J. A. Bucklew, “Support vector machine techniques for Engineering at the University of Maryland, Baltimore County, as an Assistant nonlinear equalization,” IEEE Trans. Sig. Process., vol. 48, no. 11, pp. Professor. His research interests include statistical signal processing, optimiza- 3217–3226, 2000. tion, and machine learning, with applications to wireless communication and [34] A. J. Smola and B. Scholkopf,¨ “A tutorial on support vector regression,” networking, future power systems, and (big) data analytics. He is serving as Stat. Comput., vol. 14, no. 3, pp. 199–222, 2004. an Associate Editor for IEEE Signal Processing Letters. [35] A. J. Smola, B. Scholkopf,¨ and K.-R. Muller,¨ “The connection between regularization operators and support vector kernels,” Neural Netw., vol. 11, no. 4, pp. 637–649, 1998. [36] R. Tandra and A. Sahai, “SNR walls for signal detection,” IEEE J. Sel. Topics Sig. Process., vol. 2, no. 1, pp. 4–17, 2008. G. B. Giannakis (Fellow’97) received his Diploma [37] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. in Electrical Engr. from the Ntl. Tech. Univ. of Baraniuk, “Beyond Nyquist: Efficient sampling of sparse bandlimited Athens, Greece, 1981. From 1982 to 1986 he was signals,” IEEE Trans. Inf. Theory, vol. 56, no. 1, pp. 520–544, Jan. with the Univ. of Southern California (USC), where 2010. he received his MSc. in Electrical Engineering, [38]G.V azquez-Vilar´ and R. Lopez-Valcarce,´ “Spectrum sensing exploiting 1983, MSc. in Mathematics, 1986, and Ph.D. in guard bands and weak channels,” IEEE Trans. Sig. Process., vol. 59, Electrical Engr., 1986. He was with the University of no. 12, pp. 6045–6057, 2011. Virginia from 1987 to 1998, and since 1999 he has [39] G. Wahba, Spline Models for Observational Data, ser. CBMS-NSF been a professor with the Univ. of Minnesota, where Regional Conference Series in Applied Mathematics. SIAM, 1990, he holds an Endowed Chair in Wireless Telecom- vol. 59. munications, a University of Minnesota McKnight [40] S. Wang, W. Zhu, and Z.-P. Liang, “Shape deformation: SVM regression Presidential Chair in ECE, and serves as director of the Digital Technology and application to medical image segmentation,” in Proc. Int. Conf. Center. Comput. Vis., vol. 2, Vancouver, Canada, 2001, pp. 209–216. His general interests span the areas of communications, networking and [41] Z. Wang, K. Crammer, and S. Vucetic, “Breaking the curse of kerneliza- statistical signal processing - subjects on which he has published more than tion: Budgeted stochastic gradient descent for large-scale svm training,” 400 journal papers, 680 conference papers, 25 book chapters, two edited J. Machine Learning Research, vol. 13, no. 1, pp. 3103–3131, 2012. books and two research monographs (h-index 123). Current research focuses [42] T. Yucek and H. Arslan, “A survey of spectrum sensing algorithms for on learning from Big Data, wireless cognitive radios, and network science cognitive radio applications,” IEEE Commun. Surveys Tutorials, vol. 11, with applications to social, brain, and power networks with renewables. He is no. 1, pp. 116–130, 2009. the (co-) inventor of 28 patents issued, and the (co-) recipient of 8 best paper awards from the IEEE Signal Processing (SP) and Communications Societies, including the G. Marconi Prize Paper Award in Wireless Communications. He also received Technical Achievement Awards from the SP Society (2000), from EURASIP (2005), a Young Faculty Teaching Award, the G. W. Taylor D. Romero (M’16) received his M.Sc. and Ph.D. Award for Distinguished Research from the University of Minnesota, and the degrees in Signal Theory and Communications from IEEE Fourier Technical Field Award (2015). He is a Fellow of EURASIP, and the University of Vigo, Spain, in 2011 and 2015, has served the IEEE in a number of posts, including that of a Distinguished respectively. From Jul. 2015 to Nov. 2016, he was a Lecturer for the IEEE-SP Society. post-doctoral researcher with the Digital Technology Center and Department of Electrical and Computer Engineering, University of Minnesota, USA. In Dec. 2017, he joined the Department of Information and Communication Technology, University of Agder, Roberto Lopez-Valcarce´ (S’95-M’01) received the Norway, as an associate professor. His main research Engineering degree from the interests lie in the areas of signal processing, com- University of Vigo, Vigo, Spain, in 1995, and the munications, and machine learning. M.S. and Ph.D. degrees in electrical engineering from the University of Iowa, Iowa City, in 1998 and 2000, respectively. During 1995 he was a Project Engineer with Intelsis. He was a Ramon´ y Cajal Postdoctoral Fellow of the Spanish Ministry of Sci- ence and Technology from 2001 to 2006. During that period, he was with the Signal Theory and Com- munications Department, University of Vigo, where he currently is an Associate Professor. His main research interests include adaptive signal processing, digital communications, and sensor networks, having coauthored over 50 papers in leading international journals. He holds several patents in collaboration with industry. Dr. Lopez-Valcarce´ was a recipient of a 2005 Best Paper Award of the IEEE Signal Processing Society. He served as an Associate Editor of the IEEE TRANSACTIONS ON SIGNAL PROCESSING from 2008 to 2011, and as a member of the IEEE Signal Processing for Communications and Networking Technical Committee from 2011 to 2013.