1 A Hilbert Space Theory of Generalized Graph Signal Processing
Feng Ji and Wee Peng Tay, Senior Member, IEEE
Abstract
Graph signal processing (GSP) has become an important tool in many areas such as image pro- cessing, networking learning and analysis of social network data. In this paper, we propose a broader framework that not only encompasses traditional GSP as a special case, but also includes a hybrid framework of graph and classical signal processing over a continuous domain. Our framework relies extensively on concepts and tools from functional analysis to generalize traditional GSP to graph signals in a separable Hilbert space with infinite dimensions. We develop a concept analogous to Fourier transform for generalized GSP and the theory of filtering and sampling such signals.
Index Terms
Graph signal proceesing, Hilbert space, generalized graph signals, F-transform, filtering, sampling
I.INTRODUCTION
Since its emergence, the theory and applications of graph signal processing (GSP) have rapidly developed (see for example, [1]–[10]). Traditional GSP theory is essentially based on a change of orthonormal basis in a finite dimensional vector space. Suppose G = (V,E) is a weighted, arXiv:1904.11655v2 [eess.SP] 16 Sep 2019 undirected graph with V the vertex set of size n and E the set of edges. Recall that a graph signal f assigns a complex number to each vertex, and hence f can be regarded as an element of n C , where C is the set of complex numbers. The heart of the theory is a shift operator AG that is
usually defined using a property of the graph. Examples of AG include the adjacency matrix and the Laplacian matrix of G. Graph shift operators are typically chosen to be symmetric. By the
This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE2018-T2-2-019. The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore (e-mail: [email protected], [email protected]). 2
n spectral theorem, all the eigenvalues of AG are real and C has an orthonormal basis consisting of eigenvectors of AG. Therefore, one can define the notion of vertex and frequency domains, on which the rest of the theory builds upon. However, instead of assigning a complex number at each vertex, one can assign mathematical objects with richer structures to each vertex of a graph. An example of such a mathematical object comes from a Hilbert space. In particular, it would be of interest to assign an L2 function over a finite closed interval [a, b] to each vertex of a graph. Such a consideration is not just a plain generalization, as it has important practical applications in for example, sensor networks and social networks, where each node in the network is observing a time-varying continuous signal. We have the following considerations.
Example 1. (a) The case where each vertex signal is a complex number belongs to the traditional GSP framework, which has been extensively studied in [1]–[10]. (b) An extension of traditional GSP to the case where each vertex signal is a finite-length discrete time series is proposed in the time-vertex GSP framework [8]. Such a vertex signal is from a Hilbert space Cm for some m ≥ 1. The assumption here is that all the time series at different vertices share the same time indices. (c) The case where the signal at each graph vertex v ∈ V is itself a graph signal on a finite
graph Gv has not been studied in the literature, to the best of our knowledge. In this case, each vertex signal is from Cm for some m ≥ 1, but unlike the time-vertex GSP framework, the underlying graph topology can be more general (the time-vertex GSP framework is equivalent to using a path graph with m vertices). Furthermore, different vertices can have different underlying graphs for their signals, and the correspondence between the vertices
in Gv and those in Gu for v 6= u need not be one-to-one. Applications include relating the signals in one social network to another social network, or in problems where the underlying graph at each vertex of G is varying (cf. Section VI-D). For example, when studying signals on time-varying adaptive networks ( [11]–[14]), which includes social networks, biological networks and neural networks, we may let G be a path
graph representing a finite portion of the time line. The signal ft at each node t is a vector in Cm. Due to the additional local structures in the signal, it is usually advantageous to
consider these structures when performing signal analysis. Thus, we can regard ft as a 3
graph signal corresponding to time t. Furthermore, multiple graph signals ft from different networks can be related to each other, forming yet another underlying graph at each time t. (d) The case where the signal at each vertex is drawn from an infinite dimensional Hilbert space has again never been studied. An example is continuous time signals that are not bandlimited (in the time direction). In this case, from the Shannon-Nyquist Theorem [15], it is impossible to recover the full undistorted signal using a finite sampling rate (cf. Section VI-B). Thus, the time-vertex GSP framework, which requires discrete time series at all graph vertice, would introduce errors in the inference procedure. Furthermore, signals may not be sampled synchronously in a sensor network (sensor synchronization usually requires additional effort [16]–[18]) so that the time-vertex GSP framework may not be the best approach. A more general GSP framework is thus needed to address many practical engineering scenarios. In addition, a Hilbert space, such as the space of L2 functions over an appropriate domain Ω, usually has rich internal structures. The usual GSP framework takes a “snapshot picture" by looking at the graph signal at x ∈ Ω one by one. This approach may easily disregard the internal relations among the points in Ω. In this paper, the proposed framework avoids such a local consideration entirely, and fuses the graph operators with operations on L2(Ω) for signal processing to achieve minimal information loss. For example, to study continuous time graph signals mentioned above, it can be more beneficial to consider the graph signal to be a function belonging to L2(R), such that we may apply operations such as continuous Fourier transform and wavelet transforms in conjunction with graph based transforms. We shall demonstrate this by the example of information propagation on social networks in Section VI-B.
In this paper, we propose and develop a Hilbert space theory of generalized GSP that can handle all the above cases. To do that, it is inevitable that the theory is based on an abstract foundation, which requires the reader to have a good grasp of functional analysis concepts and tools. For these, we refer the reader to [19], [20]. We shall constantly refer back to Example 1 to illustrate and explain theoretical results. We shall develop the theory parallel to classical signal processing and GSP. Important notions such as convolution filters and bandlimitedness are best understood when signals are viewed in the frequency domain. Therefore, defining what constitutes the frequency domain is a hallmark of both classical signal processing and traditional GSP. In the same spirit, we first define the 4
frequency domain for the generalized GSP framework using the spectral theory of compact operators on Hilbert spaces. We then proceed to discuss filtering and sampling. A preliminary version of this paper was presented in the conference paper [21], which introduced some of the basic concepts in this paper without proof. The rest of the paper is organized as follows. In Section II, we set up the framework by defining graph signals in a separable Hilbert space. In Section III, we introduce the notion of frequency. Based on this, we develop the concepts of filtering and sampling in Section IV and Section V. We present numerical results in Section VI and conclude in Section VII. Throughout, we provide examples to highlight situations where the framework of the paper is not only applicable but also necessary. Notations. We use R and C to denote the real and complex fields, respectively, while Z denotes the set of integers. The symbol ⊗ denotes tensor product, ◦ is function composition, and ∼= √ denotes isomorphic equivalence. Id is the identity operator and i = −1. Cn is equipped with 2 2 an inner product h·, ·i n with corresponding norm k·k n . The space L (Ω) = L (Ω, F, µ) with C C R 2 (Ω, F, µ) a measure space, is the collection of functions f :Ω 7→ C such that Ω |f| dµ < ∞.
II.GENERALIZEDGRAPHSIGNALS
Let G = (V,E) be a simple finite undirected weighted graph and X be a metric space. In this paper, when we talk about a vector space, we always assume that the base field is the complex numbers C, unless otherwise stated. In traditional GSP, the signals on the vertices of G are assumed to be real or complex. We generalize this as follows.
Definition 1. Suppose H is a Hilbert space (i.e., a complete inner product space). A graph signal in H is a function f : V → H. The space of graph signals in H is denoted by S(G, H).
With a few examples below, we demonstrate why this generalized notion is versatile. In short, Hilbert spaces include both the more familiar finite dimensional cases and also infinite dimensional cases that can represent more realistic signals in practice.
Example 2. (a) Suppose that H = C, then f ∈ S(G, H) is a traditional graph signal where f(v) ∈ C for each v ∈ V . This case has been extensively studied in [1] and the references therein. (b) Suppose H = L2([a, b]), the space of complex-valued L2 functions over a finite closed interval [a, b], with the Lebesgue measure (cf. Example 1(d)). A graph signal f in H assigns 5
an L2 function on [a, b] to each vertex of G. Alternatively, we can view f as a function assigning a complex number to each point in V ×[a, b] (where × denotes the usual Cartesian product). Consequently, as G is a finite graph, S(G, H) = L2(V × [a, b]). If b > a ≥ 0, S(G, H) can be viewed as a family of traditional graph signals parametrized by a continuous time domain [a, b]. To deal with such an H, one may use finite dimensional subspaces, e.g., polynomials of bounded degree, for approximations. However, there are continuous functions that render such a strategy a failure (cf. Theorems 5.4 and 5.5 in [22]). Therefore, it is useful to have a framework that encompasses such signals from infinite dimensional Hilbert spaces. (c) Suppose H = L2(G0) with discrete measure, where G0 = (V 0,E0) is a finite graph (possibly directed). By finiteness, H is the space of finite complex graph signals on G0. We define the 0 graph structure for the product G × G as follows (see Fig. 1 for an example): (v1, u1) is 0 0 connected to (v2, u2) in V × V if either v1 = v2 and (u1, u2) ∈ E with the edge weight
the same as the weight of (u1, u2), or u1 = u2 and (v1, v2) ∈ E with the edge weight the
same as the weight of (v1, v2). Therefore, S(G, H) can be identified with complex signals on the graph G × G0. In particular, if G0 is a path graph, S(G, H) corresponds to the time- vertex GSP framework of [8]. Strictly speaking, as we may view S(G, H) as traditional GSP signals on the graph G × G0, this example remains within the framework of [1] in this respect.
v1 (v1, u2)
(v1, u1)
v 2 G
u1 u2 (v2, u1) G × G0 G0
Fig. 1. An example of G × G0.
In the following, we first review some basic facts about Hilbert spaces (cf. Chapter 6 of [19]). 6
By definition, a Hilbert space H is equipped with an inner product, denoted by h·, ·iH. A Hilbert space is separable if it has a countable orthonormal basis. Throughout this paper, we assume that H is separable, which is the case for most physical signals. Since H is separable, without loss of generality, we can assume H = L2(Ω) for some measurable space Ω with a measure µ [20, Theorem 3.4.27]. For each f ∈ S(G, H) and u ∈ V , we then have f(u) ∈ L2(Ω) and f(u)(x) ∈ C for each x ∈ Ω. A useful point of view is that we can treat f as a function of two variables and write f(u, x) for f(u)(x). Recall that for a finite dimensional Euclidean vector space Cn, we may form the tensor product n Pn n (cf. [23, Chapter IV.5]) C ⊗ H as the set of finite linear sums i=1 vi ⊗ hi, with vi ∈ C and n hi ∈ H for all i = 1, . . . , n, such that the following holds for any v1, v2, v ∈ C and any h1, h2, h ∈ H:
(a) v1 ⊗ h + v2 ⊗ h = (v1 + v2) ⊗ h;
(b) v ⊗ h1 + v ⊗ h2 = v ⊗ (h1 + h2); (c) rv ⊗ h = v ⊗ rh for r ∈ C. n The tensor product ⊗ H is equipped with an inner product h·, ·i n induced (linearly) by: C C ⊗H
hv ⊗ h , v ⊗ h i n = hv , v i n hh , h i , (1) 1 1 2 2 C ⊗H 1 2 C 1 2 H which defines a metric on Cn ⊗ H. As Cn is finite dimensional, Cn ⊗ H is complete and hence n a Hilbert space. Suppose Φ = {φi}1≤i≤n is an orthonormal basis of C and Ξ = {ξj}j≥1 is an orthonormal basis of H. Then Φ ⊗ Ξ = {φi ⊗ ξj}1≤i≤n,j≥1 forms an orthonormal basis of Cn ⊗ H. Using the tensor product construction, we have the following alternative description of S(G, H), which shows that S(G, H) is a separable Hilbert space.
Lemma 1. S(G, H) is a vector space and there is an isomorphism between S(G, H) and Cn ⊗H given by X ψ(f) = v ⊗ f(v) (2) v∈V for f ∈ S(G, H), where V is identified with the standard basis of Cn.
Proof: The vector space structure of S(G, H) comes from that of H: for f, g ∈ S(G, H), u ∈ V and r ∈ C, we have (f + g)(u) = f(u) + g(u) ∈ H and (rf)(u) = rf(u). 7
It is clear that ψ is an injective mapping from S(G, H) to Cn ⊗ H. For the inverse, each a ∈ n P −1 C ⊗ H can be written in the form a = v∈V v ⊗ hv, where hv ∈ H. We define ψ (a)(v) = hv so that ψ−1(a) ∈ S(G, H) and is the inverse map of ψ. Hence ψ is an isomorphism and the proof is complete. In the rest of this paper, we will always identify V with the standard basis of Cn. Then, Lemma 1 gives a map ψ that identifies S(G, H) with Cn ⊗ H, so that we may carry structures on Cn ⊗ H to S(G, H). In particular, S(G, H) is a Hilbert space and we can define an inner product for f, g ∈ S(G, H) as
hf, gi = hψ(f), ψ(g)i n . (3) C ⊗H ∼ n Since S(G, H) = C ⊗ H, we will often abuse notations by treating f ∈ S(G, H) as an element of Cn ⊗ H, keeping in mind the isomorphism map ψ. Furthermore, one can extend f : V 7→ H n P uniquely to a linear transformation C 7→ H, i.e., f(u, x) = v∈V u(v)f(v, x) is uniquely n n defined for all u ∈ and x ∈ Ω. Here, u(v) = hu, vi n is the v-th component of u in . C C C
III. F-TRANSFORM
To give an overview, in the same spirit as traditional GSP, we introduce the notion of Fourier transformation for the generalized GSP framework. We refer to this as simply the F-transform. In doing so, we are able to introduce the notion of a frequency domain for generalized GSP. The interplay between the graph domain and the frequency domain plays an essential role in signal processing. As signals from a Hilbert space H have their own transform space, we can decompose the F-transform into simpler pieces, called partial F-transforms, which we also introduce in this section. Fix orthonormal bases Φ of Cn and Ξ of H. We have seen that Φ ⊗ Ξ forms an orthonormal basis of S(G, H). For each f ∈ S(G, H), φ ∈ Φ and ξ ∈ Ξ, the joint F-transform is defined as:
F (φ ⊗ ξ) = hf, φ ⊗ ξi hψ(f), φ ⊗ ξi n , (4) f , C ⊗H where ψ is the isomorphism map in Lemma 1 and the inner product on the right-hand side of P 2 (4) is as defined in (1). Note that φ⊗ξ |Ff (φ ⊗ ξ)| < ∞ for all f ∈ S(G, H). n P Since we have identified V with the standard basis of C , we write f(·, x) = v∈V f(v, x)v ∈ Cn. The partial F-transforms are defined as:
H-transform: Hf (ξ)(v) = hf(v, ·), ξiH, (5)
G-transform: G (φ)(x) = hf(·, x), φi n , (6) f C 8
for every v ∈ V and x ∈ Ω. Since Φ ⊗ Ξ is an orthonormal basis, given a sequence of numbers P 2 g = (g(φ ⊗ ξ))φ,ξ such that φ,ξ |g(φ ⊗ ξ)| < ∞, the inverse F-transform is given by
−1 X Fg = g(φ ⊗ ξ) · φ ⊗ ξ. (7) φ,ξ −1 Clearly, if g = Ff , then Fg = f, and vice versa. The definitions above do not involve the graph G. It appears in the following way: the orthonormal basis Φ is usually chosen as a set of eigenbasis of a symmetric graph shift operator
AG. Common choices of AG include the adjacency matrix and the Laplacian matrix of G. In the same spirit as [10], the inner product h·, ·i n and the basis Φ (hence A ) can be chosen C G judiciously depending on the application.
Example 3.
(a) In Example 2(a), H = C, which in our terminology, can be identified with L2({0}). In this example, we simply use r ∈ C to represent each element in H so that Ξ = {1} and
Hf (1)(v) = f(v) for each v ∈ V . The H-transform is thus simply the graph signal itself. We also have G (φ) = hf, φi for each φ in the eigenbasis of a graph shift operator. f Cn The G-transform, and hence the F-transform, are both equivalent to the traditional graph Fourier transform in [1]. √ (b) In Example 2(b), H = L2([0, 2π]) (cf. Example 1(d)). Using the basis {exp(imx)/ 2π : m ∈ Z} for H, we see that the H-transform of f(v) for each v ∈ V is simply its Fourier transform. For each x ∈ [0, 2π], the G-transform of f(·, x) is the traditional graph Fourier transform of the graph signal {f(v, x): v ∈ V }. The F-transform however does not have any equivalence in traditional GSP (which deals with finite-dimensional spaces) as [0, 2π] is a continuous domain with H being infinite dimensional. For such an H, there is an abundant family of signals (e.g., rectangular pulse functions [24]) whose F-transforms cannot be described by traditional GSP Fourier transforms at finite samples. (c) In Example 2(c), G0 = (V 0,E0) is another graph and we have taken H = L2(G0). In this 2 0 case, S(G, H) = L (G × G ). Then each φ ⊗ ξ is an eigenvalue of the matrix AG ⊗ AG0 . The joint F-transform is nothing but the graph Fourier transform of signals on G × G0, which is the same as the joint Fourier transform defined in [8].
n P Since we identify V with the standard basis of C , we can write Hf (ξ) = v∈V Hf (ξ)(v)v ∈ n C for each ξ ∈ Ξ. Note also that for each φ ∈ Φ, Gf (φ) is a mapping Ω 7→ C and from the 9
Cauchy-Schwarz inequality, we have Z Z |G (φ)(x)|2dµ(x) ≤ kf(·, x)k2 dµ(x) f Cn Ω Ω Z X X Z = |f(v, x)|2dµ(x) = |f(v, x)|2dµ(x) < ∞, Ω v∈V v∈V Ω where the first equality follows from Parseval’s formula, and the last inequality is because 2 f(v, ·) ∈ H = L (Ω) and V is finite. Therefore, Gf (φ) ∈ H.
Lemma 2. For any φ ∈ Φ, ξ ∈ Ξ and f ∈ S(G, H), we have
X 0 0 Hf (ξ) = Ff (φ ⊗ ξ)φ , (8) φ0∈Φ
X 0 0 Gf (φ) = Ff (φ ⊗ ξ )ξ , (9) ξ0∈Ξ and
F (φ ⊗ ξ) = hH (ξ), φi = hG (φ), ξi . (10) f f Cn f H
Proof: For any φ ∈ Φ, we have * + X hH (ξ), φi = H (ξ)(v)v, φ f Cn f v Cn X = H (ξ)(v)hv, φi f Cn v X = hf(v), ξi hv, φi n from (5) H C v X = hv ⊗ f(v), φ ⊗ ξi n from (1) C ⊗H v = hf, φ ⊗ ξi from (2) and (4)
= Ff (φ ⊗ ξ), which proves the first equality in (10). Since Φ is an orthonormal basis for Cn, (8) follows. The other identities follow similarly and the proof is complete. P As Φ ⊗ Ξ is an orthonormal basis, each f ∈ S(G, H) can be written as f = φ⊗ξ Ff (φ ⊗
ξ) · φ ⊗ ξ. The terms Ff (φ ⊗ ξ) are square summable. On the other hand, suppose we are given a mapping L on S(G, H) into C such that L(φ ⊗ ξ) is square summable, from the inverse P F-transform, we can view L as an element of S(G, H) given by φ,ξ L(φ ⊗ ξ) · φ ⊗ ξ. An 10
immediate example is Ff can be identified with f when viewed as an element of S(G, H). This point of view is useful when we discuss convolution later on. For a general H, an important source of orthonormal basis comes from the eigenvectors of a family of bounded linear operators. More specifically, recall that a bounded linear operator A on a Hilbert space H is called compact [19, Chapter 21.1] if the image of the closed unit ball has a compact closure. It is moreover self-adjoint if hAx, yiH = hx, AyiH for any x, y ∈ H. The spectral theorem (see [19, Chapter 28]) in this case says that if A is compact and self-adjoint, then all the eigenvalues of A are real and H has an orthonormal basis consisting of eigenvectors of A. For the rest of this paper, we assume the following.
Assumption 1. The following are given:
(a) AG is a self-adjoint graph shift operator on G. (b) A is a compact, self-adjoint and injective operator on H. n (c) Φ is an orthonormal basis of C consisting of the eigenvectors of AG. Ξ is an orthonormal basis of H consisting of the eigenvectors of A.
Assumption 1(b) may be further generalized in some cases to allow the operator A to be different for different vertices of G. We discuss this generalization in Section IV-E.
For φ ∈ Φ and ξ ∈ Ξ, we use λφ and λξ to denote their corresponding eigenvalues. Now we can make the following definition.
−1 Definition 2. For f ∈ S(G, H), the frequency range of f is defined to be {(λφ, λξ ) ∈ R × R :
Ff (φ ⊗ ξ) 6= 0, φ ∈ Φ, ξ ∈ Ξ} for Φ and Ξ in Assumption 1.
−1 We use λξ in Definition 2, which is more convenient when we deal with the notion of −1 bandlimitedness in Section IV-D. We also motivate using λξ instead of λξ in Example 4(b) below.
Example 4.
(a) In the case of H = C, every operator A(a) = Aa, where A ∈ C\{0}, is compact, self- adjoint and injective. Then the frequency range of f is {(λ , 1/A): hf, φi 6= 0, φ ∈ Φ}, φ C which is equivalent to the frequency range in traditional GSP since A is a constant. (b) Suppose H is the subspace of L2([0, 2π]) consisting of f such that f(0) + f(2π) = 0 (cf. Example 1(d)). It is a closed subspace of L2([0, 2π]) and hence it is also a Hilbert 11
d space. Consider the differential operator D = −i . The eigenvectors of D given by √ dx Ξ = {ξm = exp(i(m + 1/2)x)/ 2π : m ∈ Z} forms an orthonormal basis of H where the eigenvalue of ξm is m + 1/2. Note that Ξ is a variant of the standard Fourier basis. However, D is not well-defined on all of H. To fit into our framework, we define i Z x Z 2π A(f)(x) = f(t)dt − f(t)dt , 2 0 x for all f ∈ H. It is compact, self-adjoint and injective, with eigenvectors Ξ and correspond- ing eigenvalues {(m + 1/2)−1 : m ∈ Z} because the composition A ◦ D is the identity map on the domain of D. Since Ξ is a basis, we can write
X am i(m+1/2)x f(x) = √ e , am ∈ C for all m ∈ Z. 2π m∈Z √ In traditional Fourier series, if the coefficient am of ξm = exp(i(m + 1/2)x)/ 2π is non- zero, we say m+1/2 belongs to the frequency range of f. However, m+1/2 is an eigenvalue of D and not A. Instead, due to the fact that A ◦ D is the identity map, (m + 1/2)−1 is an eigenvalue of A with the same eigenvector ξm. Therefore, since we want to use eigenvalues of the compact operator A to describe the notion of frequency, we take the reciprocal of (m+1/2)−1, i.e., m+1/2, to be in the frequency range. This makes our definition consistent −1 with definitions from traditional Fourier series. This example thus motivates our use of λξ in Definition 2. For a specific illustration, suppose that G is the undirected path graph with three ver- tices u1, u2, u3 shown in Fig. 2. Let AG be its Laplacian matrix, whose eigenvalues and eigenvectors are given by:
1 T λφ = 0, φ1 = √ [1, 1, 1] , 1 3
1 T λφ = 1, φ2 = √ [0, −1, 1] , 2 2
1 T λφ = 3, φ3 = √ [−2, 1, 1] . 3 6
Consider f ∈ S(G, H) that assigns f(u2, x) = 2 cos(x/2) to the central node, and f(u1, x) = √ √ 1 π 1 π 2 sin( 2 x − 4 ) and f(u3, x) = 2 sin( 2 x + 4 ) to the side nodes. It can be verified that 12
Ff (φ ⊗ ξ) 6= 0 only for φ ⊗ ξ ∈ {φ2 ⊗ ξ−1, φ2 ⊗ ξ0, φ3 ⊗ ξ−1, φ3 ⊗ ξ0}. For example, by Lemma 2, we obtain
Ff (φ2 ⊗ ξ−1) = hGf (φ2), ξ−1iH Z 2π 1 x √ x π 1 = √ (0 − 2 cos + 2 sin( + )) · √ eix/2dx 2 2 2 4 2π 0√ √ π π = − + i . 2 2 Hence, the frequency range of f is {(1, −1/2), (1, 1/2), (3, −1/2), (3, 1/2)}.
Fig. 2. Illustration of Example 4(b). Each red curve depicts the signal at each node of G, which is a function in L2([0, 2π]).
IV. FILTERING
In this section, we consider filtering of generalized graph signals. We focus on filters defined based on F-transforms. Filters can have several different uses, to name a few: they can be used remove noise in signals, describe inherent relations between datasets and transform a signal into a different domain that is more convenient for analysis. Our generalized GSP filtering theory have parts similar to traditional GSP and signal processing over C, while additional new features present due to the richer structure in a general H. We discuss a few general families of filters that are related with each other and illustrate them with examples. We also point out why some of them are particularly useful.
Definition 3. A filter is a bounded linear transformation L : S(G, H) → S(G, H).
Since S(G, H) is a Hilbert space, any filter L is continuous as it is bounded. From isomor- phism, we equivalently regard any filter L on S(G, H) to be a filter on Cn ⊗ H. 13
A. Shift Invariant Filters
For the two operators AG and A in Assumption 1, we can form their tensor product: AG ⊗ A
induced (linearly) by AG ⊗ A(v ⊗ h) = AG(v) ⊗ A(h). Note that AG ⊗ A is an operator on the ∼ n Hilbert space S(G, H) = C ⊗ H. As both AG and A are compact (AG is compact because all n operators on the finite dimensional space C are) and self-adjoint, so is AG ⊗A. The orthonormal basis Φ ⊗ Ξ consists of the eigenvectors of AG ⊗ A.
We can define AG ⊗ Id and Id ⊗ A, and abbreviated as AG and A if no confusion arises. If
H is also finite dimensional, then AG ⊗ A is the Kronecker product of matrices AG and A.
Definition 4. A filter L is called shift invariant if both AG ◦ L = L ◦ AG and A ◦ L = L ◦ A hold for AG and A in Assumption 1. It is weakly shift invariant if (AG ⊗ A) ◦ L = L ◦ (AG ⊗ A).
These concepts reduce to the traditional notion of shift invariance in GSP if H = C given in [1] since A is trivial in that case. The following is an example of a shift invariant filter that we will frequently refer to in the sequel.
p Example 5. Let P (x) = a0 + a1x + ... + apx be a polynomial of degree p < ∞. Then,
P (AG ⊗ A) commutes with both AG and A and is thus shift invariant. Polynomials on AG ⊗ A give an important family of shift invariant filters, which can be used to approximate other filters due to the Stone-Weierstrass Theorem.
Shift invariant filters also appear in a large array of applications such as data compression, customer behavior prediction [2] and machine learning models such as graph convolutional networks [25], [26], in which polynomials of the shift operators are used. In general, a weakly shift invariant filter is not necessarily shift invariant. For a simple example, 1 0 let A = B = . Then 0 2 1 0 0 0 1 0 0 0 0 2 0 0 0 1 0 0 A ⊗ B = and A ⊗ Id = . 0 0 2 0 0 0 2 0 0 0 0 4 0 0 0 2 It is easy to see that as long as the top row, bottom row, left-most column and right-most column of L are zero, then L commutes with A ⊗ B, i.e., L is weakly shift invariant. However, as the 14
second and third diagonal entries of A ⊗ Id are distinct, some of these Ls do not commute with A ⊗ Id. We next describe situations where these two notions are equivalent.
Proposition 1. Suppose L is a filter on S(G, H). (a) If L is shift invariant, then it is also weakly shift invariant. (b) If Φ ⊗ Ξ in Assumption 1 consists of eigenvectors of L, then L is shift invariant.
(c) The eigenspace of each eigenvalue λ 6= 0 of AG ⊗ A is of finite dimension mλ. If mλ = 1 for all λ, then a weakly shift invariant filter is shift invariant. (d) Suppose L is self-adjoint. Then L is weakly shift invariant if and only if L is shift invariant.
Proof: (a) Suppose L is shift invariant. We verify that
(AG ⊗ A) ◦ L = (AG ⊗ Id) ◦ (Id ⊗ A) ◦ L
= (AG ⊗ Id) ◦ L ◦ (Id ⊗ A)
= L ◦ (AG ⊗ Id) ◦ (Id ⊗ A)
= L ◦ (AG ⊗ A).
(b) Since Φ ⊗ Ξ is a basis, its vectors are also eigenvectors to both AG ⊗ Id and Id ⊗ A and shift invariance of L follows from Defintion 4.
(c) As AG ⊗ A is compact, from [19, Chapter 21.2, Theorem 6]), for λ 6= 0, mλ is finite.
If mλ = 1 for all λ, then all eigenvalues are non-zero and each eigenspace Vλ is one
dimensional. Suppose L is weakly shift invariant. Then each eigenspace Vλ of AG ⊗ A
is an invariant subspace of L, i.e., L(Vλ) ⊂ Vλ. As each Vλ is one dimensional for each
eigenvalue λ and the basis vectors in Φ ⊗ Ξ are the eigenvectors of AG ⊗ A, they are also the eigenvectors of L. Therefore, L is shift invariant from (b). (d) Suppose L is weakly shift invariant. Then, from [19, Chapter 28 Theorem 7], the basis Φ ⊗ Ξ contains the eigenvectors of L and (b) shows that L is shift invariant. The converse follows from (a).
If each eigenspace of A is one dimensional, we hope the same holds of AG ⊗ A so that by Proposition 1(c), all weakly shift invariant filters are shift invariant. We have the following result. 15
Proposition 2. Suppose that the graph G has at least 3 nodes. Furthermore, each edge weight of G is chosen randomly according to a distribution absolutely continuous with respect to (w.r.t.) the Lebesgue measure. Consider AG and A in Assumption 1. If each eigenspace of A is one dimensional, then the following holds with probability one:
(a) If AG is the adjacency matrix of G, then AG ⊗A is injective and each eigenspace of AG ⊗A has dimension 1.
(b) If AG is the Laplacian matrix of G, then the eigenspace corresponding to the eigenvalue 0
of AG ⊗ A is isomorphic to H. Moreover, for each eigenvalue λ 6= 0, the eigenspace has
dimension mλ = 1.
Proof: The proof of this result is technical and can be found in Appendix A.
If AG is the Laplacian matrix, the eigenspace V0 of AG⊗A corresponding to the eigenvalue 0 is infinite dimensional if H is infinite dimensional. If L is weakly shift invariant, then the orthogonal 0 complement H of V0 is an invariant subspace of L. To overcome the infinite dimensionality of 0 V0 so that Proposition 1 is still applicable, we may consider the restriction of AG ⊗ A to H . 0 In this case, from Proposition 2, each eigenspace of AG ⊗ A on H is one dimensional with probability one if the edge weights of G are chosen randomly according to a probability density function. We end this subsection by providing examples of filters when H is infinite dimensional.
0 0 Example 6. If L can be decomposed as a tensor product L = AG ⊗ A , then L is shift invariant 0 0 if and only if AG is shift invariant w.r.t. AG and A is shift invariant w.r.t. A. Similar to Example 4(b), consider H = L2([−1, 1]) and A is the operator Z x Z 1 A(f)(x) = f(t)dt − f(t)dt. −1 x Let A0 be the operator on H defined as f(x + 1) for −1 ≤ x ≤ 0 A0(f)(x) = 0 for 0 < x ≤ 1.
0 0 Although A looks like a shift, L = AG ⊗ A is not shift invariant w.r.t. AG ⊗ A. To see this, we verify that A◦A0 6= A0◦A. Consider f(x) = x on [−1, 1]. It is easy to verify that A◦A0(f)(x) > 0 for 0 ≤ x ≤ 1. However, A0 ◦ A(f)(x) = 0 for 0 ≤ x ≤ 1. 16
As explained above, polynomials on AG ⊗ A are shift invariant. However, due to infinite dimensionality, not every shift invariant filter is a polynomial filter as shown in the following example.
Example 7. Consider H a infinite dimensional Hilbert space and A is as in Assumption 1. Suppose a ∈ R is a positive real number larger than all the eigenvalues of A. Then A0 = (1 − a−1A)−1 is a bounded linear transformation. It has a convergent power series expansion 0 0 0 in A, hence A ◦ A = A ◦ A . If we let L = AG ⊗ A , then L is shift invariant. However, it is in
general not a polynomial in AG ⊗ A. This does not happen if H is finite dimensional, as 1−a−1A is invertible in the polynomial ring generated by A (i.e., A0 is a polynomial in A), due to the existence of the minimal polynomial.
B. Compact and Finite Rank Filters
Similar to the definition of a compact operator on the Hilbert space H, a filter L on S(G, H) is compact if its image of the closed unit ball has compact closure. A filter is of finite rank if its image is finite dimensional.
In Example 5, if H is infinite dimensional, the filter P (AG ⊗ A) is shift invariant but not
compact if the polynomial P has a non-zero constant term a0. If P does not have any constant
terms, then P (AG ⊗ A) is a compact filter since AG ⊗ A is compact. On the other hand, it is also easy to construct a compact filter that is non-shift invariant (see comment at beginning of Example 6).
Corollary 1. (a) If L is a finite rank filter, then it is a finite sum of compact filters. (b) If L is a compact filter, then it is the limit (in operator norm) of finite rank filters. More
precisely, suppose we give an ordering {w1, . . . , wi,...} of Φ ⊗ Ξ and define Li to be the
projection of the image of L to the finite dimensional subspace Si spanned by {w1, . . . , wi}.
Then Li converges to L in operator norm as i → ∞.
Proof: The claims (a) and (b) follow from [20, Theorem 4.8.11] and [27, Theorem 4.4], respectively. Consequently, to understand a compact filter L, we may instead study the finite rank approx-
imations Li as in Corollary 2(b). Suppose AG ⊗ A has no repeated eigenvalues and L is shift 17
invariant. Consider Si in the corollary and let Fi = AG ⊗ A Si : Si 7→ Si to be the restriction of AG ⊗ A on Si. Since Si is an invariant subspace of L, abusing notations, we use the same notation Li : Si 7→ Si for the restriction of Li to Si. Then, we have the standard observation as
[2]: Li is a polynomial in Fi with degree at most dim(Si) − 1 [28].
C. Convolution Filters
Suppose we fix a g ∈ S(G, H). For each f ∈ S(G, H), the element g ∗ f defined by Fg∗f =
FgFf is an element of S(G, H), i.e., X g ∗ f = Fg(φ ⊗ ξ)Ff (φ ⊗ ξ) · φ ⊗ ξ, (11) φ⊗ξ P 2 where φ⊗ξ |Fg(φ ⊗ ξ)| < ∞ since g ∈ S(G, H). It is easy to verify that g ∗ satisfies
g ∗ (af + h) = ag ∗ f + g ∗ h for a ∈ C, f, g ∈ S(G, H). Moreover, g ∗ : S(G, H) → S(G, H) is a bounded map (bounded by supφ⊗ξ |Fg(φ ⊗ ξ)| < ∞). Therefore, g ∗ is a filter. We call it a convolution filter. In the case H = C, we are in the situation of traditional GSP. The notion of convolution filter agrees with the one given in [1].
For each f = φ⊗ξ with φ ∈ Φ and ξ ∈ Ξ, it follows from definition that g∗f = Fg(φ⊗ξ)·φ⊗ξ.
Hence φ ⊗ ξ is an eigenvector of g ∗ with eigenvalue Fg(φ ⊗ ξ). From Proposition 1(b), g ∗ is a P 2 shift invariant operator. Moreover, since φ⊗ξ |Fg(φ ⊗ ξ)| < ∞, g ∗ is also a Hilbert-Schmidt operator [19, Chapter 30.8], which leads to the following corollary.
Corollary 2. The filter L = g ∗ is compact and is the limit (in operator norm) of finite rank filters.
Proof: As L = g ∗ is Hilbert-Schmidt, it is compact (cf. [19, Chapter 30.8, Exercise 11(g)]). Therefore, from Corollary 1(b), it is the limit of finite rank filters.
If H is infinite dimensional, a polynomial filter P (AG ⊗ A) as in Example 5 is non-compact if it has a non-zero constant term a0, and is thus not a convolution filter. This differs from traditional GSP where all shift invariant filters are convolution filters (when AG does not have repeated eigenvalues) since in traditional GSP, shift invariant filters are polynomials of AG [3] and H = C is finite dimensional. 18
D. Bandlimited Signals and Band Pass Filters
A signal f ∈ S(G, H) is said to be bandlimited if its frequency range (cf. Definition 2) is a bounded subset of R×R. For any set K ⊂ R×R, we use SK (G, H) to denote the set of signals whose frequency range belongs to K. As a special example, if G is a point and H = L2([a, b]), the notion of bandlimitedness agrees with its classical counterpart in the setting of Fourier series.
Lemma 3. (a) f is bandlimited if and only if its frequency range is a finite set.
(b) If K is bounded, then SK (G, H) is a finite dimensional subspace of S(G, H).
Proof: (a) As A is compact, the eigenvalues of A is bounded and accumulate only at 0 (cf. [19, Chapter −1 21.2 Theorem 6]). Therefore, the set {(λφ, λξ )} is a discrete subset of R × R and f is bandlimited if and only if its frequency range is a finite set. (b) The claim follows immediately from (a).
For each f ∈ S(G, H) and K ⊂ R × R, we define the band pass filter as a projection X PK (f) = Ff (φ ⊗ ξ) · φ ⊗ ξ. (12) −1 (λφ,λξ )∈K
In the F-transform domain, PK is nothing but multiplying with the characteristic function of K.
We have the following observation regarding PK . Note that a band-pass filter is not convolutional if K is not bounded.
Corollary 3. For any K ⊂ R × R, PK is shift invariant. Furthermore, it is a convolution filter if and only if K is bounded.
Proof: As each φ ⊗ ξ ∈ Φ ⊗ Ξ is an eigenvector of PK , it is shift invariant from Propo- sition 1(b). Moreover, it is a convolution filter if and only if the characteristic function of K −1 is square summable on the discrete set {(λφ, λξ ): φ ∈ Φ, ξ ∈ Ξ}, i.e., K being finite. By Lemma 3(a), K is finite if and only if it is bounded. Based on the discussions in this section, we encounter a few scenarios in which finite dimen- sional subspaces of S(G, H) are useful. This motivates the next section on sampling, in which we use a collection of points on V × Ω to describe these subspaces. 19
E. Adaptive Polynomial Filters
We devote this last subsection to GSP with signals belonging to adaptive networks (Exam- ple 1(c)) by extending Example 5 to allow different operators A for different vertices in G.
Let m > 0 be a fixed integer. Suppose that at each node u ∈ V , there is a graph Gu with m vertices. For different nodes u, the graphs Gu might be different (cf. Example 1(c)). The generalized signal at each node u is defined as a graph signal on Gu. In this case, we can let m H = C . Let Au be a filter of the graph signals on Gu, e.g., the Laplacian of Gu. Each Au can be viewed as an m × m matrix. A filter M on S(G, C) can similarly be viewed as an n × n matrix for a fixed ordering of the vertices in V .
We call a filter F an adaptive polynomial filter with respect to (w.r.t.) M and {Au}u∈V if P there are polynomials P1 and P2 such that F = u∈V P1(M)u ⊗ P2(Au), where P1(M)u has the same u-th column as P1(M) and 0 elsewhere. Here, ⊗ is the matrix Kronecker product.
Consider now the special case where M and Au are adjacency matrices or Laplacian matrices of the graphs G and Gu, respectively. Furthermore, suppose both P1 and P2 are degree 1 polynomials. Then, for any signal f ∈ S(G, H), the i-th component of F (f) at a node u takes contribution from the j-th component of the signal at node v if (u, v) is an edge in G and
(i, j) is an edge in the graph Gv at node v.
Fig. 3. Example to illustrate the action of the adaptive filter given in Example 8. The j-th component of the graph signal at node u contributes to the i-th component of the graph signal at node v, although i and j may not be connected by an edge in the graph at node v.
Example 8. Consider Fig. 3. The graph G is a path graph consisting of three nodes u1, u2, u3.
At each node ui, the graph Gui has 4 nodes with different edge connections. Suppose M is the 20
adjacency matrix of G and Aui is the Laplacian matrix of Gui for each i = 1, 2, 3. Let P1(x) = P3 4 a1x + b1, P2(x) = a2x + b2, and F = i=1 P1(M)i ⊗ P2(Aui ) as above. Let f ∈ S(G, C ). As an illustration, to evaluate F (f) at (u2, v1), we have F (f)(u2, v1) = a1 a2(f(u1, v1) − f(u1, v3)) + b2f(u1, v1) + b1 a2(f(u2, v1) − f(u2, v2)) + b2f(u2, v1) + a1 a2(f(u3, v1) − f(u3, v2) + f(u3, v1) − f(u3, v4)) + b2f(u3, v1) ,
i.e., a1 weights the contribution from Gu1 and Gu3 , b1 weights the contribution from Gu2 , a2 weights contribution of neighbors of v1 in each Gui , and b2 weights contribution of v1 in each
Gui . From this example, we see that the filter F gives a weighted average of the signals in a neighborhood of each node in a neighborhood of graphs. From the above example, we see that an adaptive polynomial filter F captures the hidden structures in C4 given by the graphs at each vertex. The modeling of such relationships is simplified by using the tools developed in Section II for representing generalized graph signals.
Note that in practice, G and each Gui can have different physical meanings and scales (e.g.,
G can be used to represent time while each Gui represents the correlations between node observations at time instant ui in Example 1(c)). It is inappropriate to then embed them in a big ambient graph and perform traditional GSP. This is an important reason for our proposed framework.
Finally, to conclude this section, we summarize the previous definitions of different filter families as a Venn diagram in Fig. 4. Convolution filters and bandlimited filters are shift invariant filters that can be approximated by finite rank filters. Thus, they are particularly useful and they can be approximated by polynomials on AG ⊗ A, which can then be learned with an appropriate optimization procedure from observed signal samples.
V. SAMPLING
In the general setting of this paper, the Hilbert space H usually consists of certain functions on a domain Ω. Sampling in this context has two stages: choose a subset of nodes V 0 of G, and for each v ∈ V 0, choose a finite subset from Ω. The second stage can be both synchronous or asynchronous, depending on whether the sample set can be decomposed into a product V 0 × Ω0 for some finite subset Ω0 ⊂ Ω. The need for asynchronous sampling is multi-folded, for example: 21
Fig. 4. Summary of various filter families described in this section.
(a) As we have explained in the introduction (cf. Example 1(d)), it is not always easy to achieve synchronous sampling. (b) In the case of adaptive networks (Examples 1(c) and 8), the Hilbert space H = Cn is fixed,
however, the coupled transform Av of H at each node v of G changes. Therefore, we may need to sample differently at different nodes of G. Our generalized GSP framework allows us to develop a sampling theory that encompasses asynchronous sampling. In this section, we make additional assumptions regarding H = L2(Ω). We assume that Ω is a compact subset of a finite dimensional Euclidean space Rr for some r ≥ 1, and whose interior is non-empty and connected (for example, a finite closed interval), and H is equipped with the ∼ n usual Lebesgue measure. As we pointed out earlier, each f ∈ S(G, H) = C ⊗H can be viewed as a function on V × Ω. A discussion of the case H = L2(R) is deferred to Appendix C.
Suppose V is a subspace of S(G, H) with finite dimension dV . We want to choose a finite subset W ⊂ V × Ω of size dV such that each f ∈ V is uniquely determined by its values at W .
We say that such a W determines V. For each v ∈ V , let kv(W ) be the number of points of ({v} × Ω) ∩ W . To determine V, W cannot be formed in a completely random way. For example, in general, we cannot expect that choosing dV points in {v} × Ω for a single v ∈ V does the job. However, 22
we have a slightly weaker statement in Theorem 1. Before stating the theorem, recall that a function g :Ω → C is analytic if g can be extended to a connected open neighborhood U of
Ω such that g has a convergent Taylor series expansion in an open neighborhood of x0 for any x0 ∈ U. A large family of common functions are analytic such as the polynomial functions, exponential functions and trigonometric functions.
Theorem 1 (Asynchronous sampling). Suppose H = L2(Ω) where Ω ⊂ Rr is compact, V = V 0 ⊗ H0 is a finite dimensional subspace of S(G, H), where V 0 ⊂ Cn, H0 ⊂ H is spanned by analytic functions, and W determines V. If W 0 is formed by randomly choosing, according to a distribution absolutely continuous w.r.t. the Lebesgue measure, kv(W ) points in {v} × Ω for each v ∈ V , then W 0 determines V with probability one.
Proof: See Appendix B. Intuitively, the theorem says that if one sampling scheme determines V, then almost every other sampling scheme with the same number of sampled points at each vertex v achieves the same effect. This observation makes particular use of the properties of Ω. An illustration is shown in Fig. 5.
Fig. 5. Consider the graph G = (V,E) with 4 nodes and 4 edges shown in (a) and suppose H = L2([a, b]). Suppose we want to construct W that determines V = span(Φ0 ⊗ Ξ0) with |Φ0| = 2 and |Ξ0| = 10. In (b), we show the domain [a, b] for each vertex together with the sampling locations. By Example 9 and Corollary 3(c), we can choose W consisting of 5 points for each v ∈ V as shown by the red disks in (b). However, according to Theorem 1, to determine V, we can choose W 0 consisting of red circles, which is a perturbation of W , and W 0 almost surely determines V. 23
Before proceeding further, we make the following definition.
n Definition 5. Consider k linearly independent vectors Ψ = {ψ1, . . . , ψk} in C with k ≤ n. |I| For each subset I ⊂ {1, . . . , n}, let ψi,I ∈ C be the vector formed by taking the components of ψi indexed by I. Moreover, use ΨI to denote {ψ1,I , . . . , ψk,I }. Define δ(Ψ) to be the largest integer t such that there is a partition {1, . . . , n} = I1 ∪ ... ∪ It into disjoint subsets, and each
ΨIj , 1 ≤ j ≤ t consists of linearly independent vectors.
Consider Φ0 ⊂ Φ of size k. Clearly, δ(Φ0) ≤ bn/kc. In fact, it can be shown using a similar proof as that of Proposition 2, if the edge weights are chosen according to a distribution absolutely continuous w.r.t. the Lebesgue measure, then for any k ≤ n and |Φ0| = k, δ(Φ0) = bn/kc with
probability one. The choice of I in Definition 5 corresponds to a choice of vertices VI from V . Therefore, δ(Φ0) is the maximum number of disjoint subsets of vertices one can form so that 0 0 ΦI has full column rank for each of the subsets I (by definition, we have |I| ≥ |Φ | for each
I), i.e, if the signals {f(v, x): v ∈ VI } at the vertices VI for x ∈ Ω are known, then f(v, x) is uniquely determined for all v ∈ V .
Example 9. Let AG be the Laplacian matrix of an unweighted, undirected cycle graph with 4 nodes. One finds an orthonormal basis Φ = {(1, 1, 1, 1), (−1, 0, 1, 0), (0, −1, 0, 1), (−1, 1, −1, 1)}. It is easy to verify the following from definition: for any subset Φ0 ⊂ Φ of size 2, δ(Φ0) = 2. For 0 example, if Φ = {(1, 1, 1, 1), (−1, 0, 1, 0)}, we can choose the partition {1, 2, 3, 4} = I1 ∪I2 with
I1 = {1, 2},I2 = {3, 4}. As a consequence, ΦI1 = {(1, 1), (−1, 0)} and ΦI2 = {(1, 1), (1, 0)}, and both consist of linearly independent vectors.
We consider the case where V is the span of Φ0 ⊗ Ξ0, with Φ0 and Ξ0 being finite subsets of Φ and Ξ respectively. To perform sampling, we have two useful extreme cases: (a) we choose a small subset of V 0 ∈ V and sample only at {v} × Ω for v ∈ V 0; (b) we sample at {v} × Ω for each v ∈ V with reduced amount of sample points. Case (a) corresponds to the situation where we make observations only at a small part of the graph; and case (b) corresponds to the situation where we make limited number of observations at each vertex. In the notation of Theorem 1, we have the following result regarding the two sampling schemes (see also Fig. 5).
Proposition 3. Assume the same conditions in Theorem 1. Suppose Φ0 and Ξ0 are finite subsets of Φ and Ξ respectively and V is the span of Φ0 ⊗ Ξ0. 24
(a) Any W that determines V has |W | ≥ |Φ0| · |Ξ0|. 0 0 0 (b) We can find V ⊂ V of size |Φ | such that a set W determines V with kv(W ) = |Ξ | for each 0 0 v ∈ V . For a fixed ordering of the vertices V = {v1, . . . , vn}, a specific choice of V is 0 given by {vi1 , . . . , vi|Φ0| }, which are the vertices corresponding to |Φ | linearly independent rows of the matrix whose columns are formed by Φ0. 0 0 (c) We can find W that determines V with kv(W ) < |Ξ |/δ(Φ ) + 1 for each v ∈ V , and P 0 v∈V kv(W ) = |Ξ |.
Proof: See Appendix B. Proposition 3 together with Theorem 1 provides a simple sampling procedure: Proposition 3(b) 0 0 tells us how to choose a sample set of graph vertices V and {kv(W ): v ∈ V } while Proposi- 0 0 tion 3(c) allows us to choose kv(W ) = b|Ξ |/δ(Φ )c + 1 for all v ∈ V (in particular, if the edge weights of G are randomly distributed according to a probability density function, we can choose 0 0 kv(W ) = b|Ξ | · |Φ |/nc+1). Then Theorem 1 allows us to randomly sample kv(W ) points from Ω for each v ∈ V 0 or V , using a probability density function. Note that Proposition 3(c) says that by utilizing the bandlimitedness of the underlying graph, we can use a sampling rate lower than the Nyquist rate to recover each graph vertex’s signal from its samples. Consider Example 4(b): if each f(v, ·), v ∈ V , is bandlimited to a frequency band [−B,B] in the classical Fourier series √ sense, then it is spanned by the set Ξ0 = {exp(i(m + 1/2)x)/ 2π : m + 1/2 ∈ [−B,B]} of eigenvectors. From the Shannon-Nyquist Theorem [15], to recover f(v, ·) for each v individually, one needs at least 2B samples in [0, 2π]. However, if we further know that {f(v, ·): v ∈ V } is bandlimited in the graph dimension and spanned by Φ0 with δ(Φ0) > 1, then a reduced number of samples (≈ 2B/δ(Φ0) for each vertex) is sufficient to recover all the vertex signals. We remark that for applications like learning the polynomial form of a finite rank filter (cf. end of Section IV-B), choosing different sample sets corresponds to a base change. Therefore, it does not affect the coefficients of the polynomial.
VI.APPLICATIONS AND NUMERICAL RESULTS
In this section, we discuss a few applications and present numerical results. A thorough discussion of each individual problem can be lengthy and thus beyond the scope of this paper. We shall make a few simplifying assumptions, focus on how to apply the framework of the paper and demonstrate why the generalized GSP framework can be useful. In all the applications, we n take A to be the adjacency matrix and h·, ·i n to be the standard dot product in . G C C 25
A. Asynchronous Sampling
Consider the asynchronous sampling given by the red circles in Fig. 5. In that example, the generalized graph signal f ∈ span(Φ0 ⊗ Ξ0) with |Φ0| = 2 and |Ξ0| = 10. We see that although there are 20 samples in total, it is impossible to recover either f(v, ·) or f(·, x) for each v ∈ V and x ∈ H individually. This makes it impossible to apply the time-vertex GSP framework as it requires uniform sampling. However, our asynchronous sampling results in Section V show that there are enough samples to recover the signal f. To do that, we require the generalized GSP framework introduced in this paper. To illustrate this, we consider signal recovery from samples picked “randomly”. More specif- ically, we choose two images (with the same size) of distinct digits. The graph G is thus the 2D-grid in which each vertex corresponds to a pixel. We take H = L2([−1, 1]) with Chebyshev polynomials of the first kind {Pj}j≥0 as the basis. A signal f ∈ S(G, H) is chosen such that: (a) The graph signals f(·, 1) and f(·, −1) correspond to the two chosen digit images. (b) For each x ∈ [−1, 1], f(·, x) is graph bandlimited to the first k = 300 eigenvalues of the graph Laplacian matrix. Furthermore, for each node v, the continuous signal f(v, ·) is in the span of the first B = 8 Chebyshev polynomials. Essentially, the signal f depicts a smooth change from the first image to the second. Furthermore, we add white Gaussian noise with SNR= 10B to obtain f˜. According to Theorem 1 and Proposition 3, we can expect a recovery by sampling 2k nodes and B/2 random positions on [−1, 1] for each node. We chose the random positions following a normal distribution with mean 0 and variance 0.5. Thus, we expect samples to concentrate near 0 and become sparse towards the end points −1 and 1. Such a random procedure means that with probability one, for each x ∈ [−1, 1], at most one sampled pixel of f˜(·, x) is observed. We divide [−1, 1] into 4 equal sub-intervals of size 0.5 each, and superimpose the sampled signals for each interval. They are shown as the first 4 images of Fig. 6. Though the digits are barely observable from these samples, we indeed see that the two middle images carry more samples.
A basis of the space from which f is chosen is {φi ⊗ Pj}i≤k,0≤j
Fig. 6. The top 4 images are obtained from superimposed signals at the sample nodes and sample positions from [−1, −0.5], [−0.5, 0], [0, 0.5], [0.5, 1] respectively. The bottom two images are the recovered digits: 0 and 6.
for some y = (yi,j)i≤k,0≤j φi(vm)Pj(xn). We recover y by solving the optimization: arg min My − f˜(W ) . 2 y=(yi,j )i≤k,0≤j B. Network Information Propagation and Spectral Analysis In this example, we illustrate the flexibility of generalized GSP over the time-vertex GSP proposed in [8]. The time-vertex GSP framework of [8] is briefly described in Example 2(c), and is equivalent to H = L2(G0), where G0 is a finite path graph, in our generalized GSP framework. This restricts signals at each vertex of G to be a time series over a discrete set of time indices. Furthermore, to use the joint time-vertex Fourier transform (denoted as TV- transform for convenience) in [8], the time index set of every vertex needs to be the same. In this example, we study graph signals generated from information propagation over a network G (cf. [29]–[34]). Various infections spreading models have been considered under the independent cascade framework depending on whether an infected node can recover and become re-infected subsequently. For the Susceptible-Infected (SI) model, any infected node has a positive probability to infect its neighbors, and remains infected indefinitely. On the other hand, in the Susceptible-Infected-Recovered (SIR) model, an infected node has a probability to recover, 27 afterwhich it cannot be infected again. Finally, in the Susceptible-Infected-Recovered-Infected (SIRI) model, any infected node can recover and become re-infected again. The difference in these spreading models results in different dynamical behaviors. For the SI model, all the nodes become infected almost surely; while for the SIR and SIRI models, it can happen that all the nodes are recovered if the recovery rate is high and infection rate is low. (a) Enron email network. (b) Facebook network. Fig. 7. The F-transform (top row) and TV-transform (bottom row) spectrum plots for information cascade on the Enron email and Facebook networks. The horizontal axis shows the graph eigenvalues and the vertical axis shows the Fourier series frequencies. The plots correspond to (from left to right): SI model (λI = 1), SIR model (λI = 1, λR = 1/5), SIRI model (λI = 1, λR = 1/2) and SIR model (λI = 1, λR = 1). A lighter color corresponds to a higher magnitude. Suppose that we use 0 to represent the uninfected status and 1 to represent the infected status 28 at each node. Then the infection status of each node v ∈ V follows a step function f(v, ·), which is L2 if we restrict to a finite observation period. The generalized GSP framework allows us to perform joint and partial F-transforms on the generalized graph signal f directly. On the other hand, to apply the time-vertex framework, one needs to perform uniform discrete sampling for each f(v, ·), which may result loss of information since each f(v, ·) is unbandlimited. Spectral analysis is a convenient mean to summarize signal features, and in the simulation below, we perform and compare spectral analysis of f using both the F-transform and TV-transform. For the simulation setup, we use all three information propagation models (SI, SIR, SIRI) over the Enron email network (500 nodes and average degree 12.6) and a Facebook1 (1034 nodes and average degree 32.4) network. The time-stamp of the occurrence of an infection or recovery event is generated using exponential distributions with means λI or λR, respectively. Let [0,T ] be the time interval during which observations are made. For each node v ∈ V , we obtain a step function f(v, ·) ∈ L2([0,T ]). We compute the joint F-transform and the TV-transform of f. For the F-transform, we first compute the H-transform (partial F-transform in the time direction), by noting that the Fourier transform of the standard rectangular function is the sinc function. For the TV-transform, we divide [0,T ] into uniform time slots and record the status of each node at the beginning of the slot as the graph signal for that time slot. Note that f(v, ·) for each v ∈ V is not a bandlimited signal in the time direction. Therefore, taking discrete samples of f(v, ·) at a finite rate cannot recover the original graph signal. We plot the results in Fig. 7a and Fig. 7b, where the horizontal axis shows the graph eigenvalues and vertical axis shows the time-direction Fourier series frequencies. For a fixed λI , it is more difficult for the infection to spread across the network if λR increases. Therefore, there is a gradual increase in difficulty in the diffusion process for the plots from left to right in Fig. 7a and Fig. 7b. If the diffusion process is fast, the initial spiky signal disappears fast in both the graph and time components. Therefore, we expect to see a smaller high intensity region. This agrees with what we see from Fig. 7a and Fig. 7b: we observe a clear spreading out of the higher energy part of the spectrum in the F-transform as we go from the leftmost to the rightmost plot. This phenomenon is less discernible for the TV-transform. Moreover, for the TV-transform, a common spectral phenomenon is less obvious for different propagation types on the two networks. This is because the the F-transform utilizes the full time series information at 1https://snap.stanford.edu/data/egonets-Facebook.html 29 each graph vertex (i.e., the infection and recovery time stamps) whereas the TV-transform uses only the aggregated information in each discrete time slot. C. Learning a Shift Invariant Filter Consider G with a time series over [0,T ] associated with each vertex. Suppose that the time interval [0,T ] can be divided into q sub-intervals of equal size T0. The graph time series has auto-regressive behavior from one sub-interval to the next via a shift invariant filter F . Therefore, under the generalized GSP framework, we consider H = L2([0,T ]), and the time series for the i-th sub-interval is denoted as f (i). Our goal is to learn the filter F , where f (i+1) = F (f (i)), for 1 ≤ i < q. We assume that the signals {f (i)} are bandlimited, i.e., there are numbers B and k such that each f (i) is bandlimited in the time direction by B, and the graph components are from the span of the first k eigenvectors of the graph adjacency matrix AG. We shall take the shift invariant filter F = AG ⊗P2(L) where P2 is a degree 2 polynomial and L is the translation by T0 operator on the interval [0,T ] (with wrap around). We assume uniform sampling where the samples are chosen from a subset of vertices V 0 according to Proposition 3(b). Let the samples in the i-th 0 (i) time sub-interval [(i − 1)T0, iT0] be denoted as a |V | × B matrix h . We further add noise to ˜(i) ˜(i) each sample to obtain h . Our objective is to learn P2 from {h }. We perform simulations on different graphs: random ER graphs (1000 nodes), the Enron Email 2 graph (500 nodes), and a synthetic company staff network (80 nodes, [35]). Let T0 = 2π and q = 5. For each experiment, we consider both B = 10 and B = 20, and set k = 0.4|V |. We (1) randomly generate coefficients of the polynomial P2 uniformly from [0, 1]. The initial signal f is generated as follows: we first generate real graph signals in the span of the first k eigenvectors of AG with uniformly randomly chosen coefficients in [0, 1]. Then, we use each component of these vectors as Fourier coefficients to generate a generalized signal at the corresponding vertex. We choose a subset of vertices V 0 satisfying Proposition 3(b) and sample B time samples from 0 the vertices in V in each sub-interval [(i − 1)T0, iT0], 1 ≤ i ≤ q. Independent Gaussian noise is then added to each sample h(i) to give a sequence of noisy (matrix) samples {h˜(i) : 1 ≤ i ≤ q}, from which we learn P2. The recovery of F preceeds in two steps (both involves an L2 optimization): 2https://snap.stanford.edu/data/email-Enron.html 30 ·10−2 9 Enron email graph (B=20) Enron email graph (B=10) 8 Staff network (B= 20) 7 Staff network (B=10) ER graph (B=20) 6 ER graph (B=10) 5 4 3 Prediction error 2 1 0 5 8 11 14 17 20 SNR (dB) Fig. 8. Plot of the prediction error against the SNR (in dB). (a) For the t-th sample time in [(i−1)T0, iT0] for 1 ≤ i ≤ q, we recover the entire graph signal at that sample time from the samples of V 0 as follows. Let M be the matrix with k columns corresponding to the sub-collection of eigenvectors of AG generating the signals. Let MV 0 be formed by taking rows corresponding to indices of V 0. The graph signal f (i)(·, t) is recovered by finding: ˜(i) ˜(i) f (·, t) = M · arg min MV 0 x − h (·, t) , x 2 where h(i)(·, t) is the t-th column of h(i). (b) Find P2 by solving (k·kF below is the Frobenius norm): 2 X ˜(i) ˜(i+1) min AG ⊗ P2(L)(f ) − f . P2 F 1≤i≤q The performance is evaluated by computing the prediction error, i.e., ˆ (m) (m+1) (AG ⊗ P2(L))(h ) − h F (m+1) , kh kF ˆ where P2 is the estimated P2. In Fig. 8, we plot the prediction error against the signal to noise ratio (SNR) in dB. We see that our procedure can learn F well, and the performance improves with less noise. 31 D. Adaptive Graph Signals In the third case study, we consider time series of graph signals with evolving graphs. In practice, a graph can evolve over time. Therefore even though a time series of graph signals belong to (the common) H = Cn, it can be inappropriate to disregard the underlying graphs that evolve over time. Application examples include sensor networks deployed in a dynamic environment like on the ocean surface, and social networks that evolve over time due to joining and leaving of users. Consider Example 8, where G = (V,E) is a path graph representing the time line with V = {1, 2, . . . , n} being the time indices, and each Gt for t ∈ V represents a graph with m vertices at time t. With an initial graph G0, we generate a sequence of graphs according to the evolution model proposed in [11]. For each t = 1, . . . , n, let At be the adjacency matrix of Gt, which is assumed to be known. At each time or vertex t, we have a graph signal f(t) ∈ H = Cm, which we assume to be generated from a known base signal g ∈ S(G, H) and a polynomial filter F , i.e., f = F (g) in P S(G, H). The filter F has the form F = 1≤t≤n P1(AG)t ⊗ P2(At), where both P1 and P2 are degree 1 polynomials. In other words, there are parameters a0, a1, b0, b1 such that f(t) = a0 (b1Atg(t) + b0g(t)) + a1 (b1At−1g(t − 1) + b0g(t − 1)) . In our experiments, we observe f˜(t) = f(t) + N(t) for odd time indices, where N(t) is an additive white Gaussian noise. Our objective is to infer F (i.e, the parameters a0, a1, b0, b1) from the noisy samples {f˜(t): t odd} by solving the optimization problem: X 2 min F (g)(t) − f˜(t) . 2 t≥3, odd The performance is evaluated by computing the recovery error of f1 at even time slots: Fˆ(g)(t) − f(t) X 2 , kf(t)k t≥2, even 2 ˆ where F is the estimated F . We perform simulations by choosing the initial graph G0 to be the synthetic company staff network (80 nodes), grid (400 nodes), and the Enron Email graph (500 nodes). We summarize the results in Fig. 9. The optimization is non-convex, and may yield larger error with more noise. However, the procedure can learn F well with less noise. 32 ·10−2 25 Enron email graph Grid 20 Staff network 15 10 Recovery error 5 0 −10 −5 0 5 10 SNR (dB) Fig. 9. Plot of the recovery error against the SNR (in dB). VII.CONCLUSION In this paper, we have introduced the notion of graph signals in a separable Hilbert space, which we called generalized graph signals. We demonstrated how to define F-transform as an analogy to the classical Fourier series and Fourier transform in traditional GSP. This leads to the notion of frequency. We developed theories on filtering and sampling, which find applications in reducing signals in infinite continuous domains to more manageable finite domains. We presented several scenarios in which the generalized GSP framework is more applicable than the traditional or time-vertex GSP frameworks. The generalized GSP framework and its corresponding theory discussed in this paper is not only mathematically elegant but practical. We have only scratched the surface of utilizing this framework in different applications due to space constraint. In future work, it would be of interest to apply our framework to real datasets and to develop adaptations to different scenarios of interest. Another future direction involves developing a statistical signal processing theory on top of the generalized GSP framework. 33 APPENDIX A PROOFOF PROPOSITION 2 (a) We first show that with probability one, AG is injective. We notice that AG is a symmetric matrix with 0’s along the diagonal. Such matrices are parametrized by the upper triangular entries θ = (AG(i, j))1≤i polynomial M is not identically 0. In view of the parametrization, the set A0 of adjacency matrices with zero determinant is a codimensional 1 closed submanifold of the space of all adjacency matrices. Hence, the measure of A0, for any measure absolutely continuous w.r.t. the Lebesgue measure, is zero. This implies that AG ⊗ A is injective with probability one since A is assumed to be injective. We next show that each eigenspace of AG ⊗ A has dimension 1 with probability one. From Assumption 1 and the proposition hypothesis, the set of eigenvalues λ(A) of A is countable and each of its eigenspaces is one dimensional. Let A be the set of injective adjacency matrices AG such that AG ⊗ A has repeated non-zero eigenvalues. It suffices to show the (product) Lebesgue measure, parametrized by the strict upper triangular entries, of A is 0. We may further decompose A = A1 ∪ A2 as follows: (i) AG is in A1 if AG itself has repeated eigenvalues. (ii) There are λ1, λ2 ∈ λ(A), λ1 6= λ2 such that λ1AG and λ2AG has one common eigenvalue. The set of such AG is denoted by A2,λ1,λ2 . The set A2 = ∪λ1,λ2 A2,λ1,λ2 is a countable union. We show that both A1 and A2 have zero Lebesgue measure. In addition to the strict upper triangular entries θ, we introduce one more parameter µ for an eigenvalue of AG. In the case of A1, if ν is a repeated eigenvalue of AG, then the parameters (θ, ν) satisfy two conditions simultaneously: (i) P (θ, ν) = 0, where P = det(νId − AG) is the characteristic polynomial of AG. (ii) P 0(θ, ν) = 0, where P 0 is the partial derivative of P w.r.t. ν. 0 As there are AG such that P and P do not vanish simultaneously, the intersection of the two locus P = 0 and P 0 = 0 defines a codimension ≥ 1 locus in the space of adjacency matrices (having µ adds one dimension and two independent relations remove two degrees of freedom). Therefore, A1 has Lebesuge measure 0. 34 For A2, we work with the countable union [ A2,λ1,λ2 . λ16=λ2∈λ(A) By countable subadditivity of measure, it suffices to show that each A2,λ1,λ2 has measure 0. In this case, we use ν to parametrize the common eigenvalue of λ1AG, λ2AG. The parameters (θ, ν) again satisfy 2 conditions: (i) P (θ, ν/λ1) = 0, where P is the characteristic polynomial. (ii) P (θ, ν/λ2) = 0. We need to show that P (θ, ν/λ1) and P (θ, ν/λ2) are independent (the locus of one is not contained in the locus of the other). As above, we need to find AG such that P (θ, ν/λ1) and P (θ, ν/λ2) do not vanish simultaneously. We shall briefly indicate how such AG is chosen and omit the details. If λ1/λ2 6= −1 and n is even, AG can be chosen with 1 at AG(i, n + 1 − i) and 0 otherwise. If λ1/λ2 6= −1 and n is odd, we let the top-left 3 × 3 block being the adjacency matrix of the size 3 complete graph and let the bottom-right n − 3 × n − 3 block be chosen as the even case above. If λ1/λ2 = −1, we choose three adjacency matrices (randomly choosing edge weights usually suffices) AG1 ,AG2 ,AG3 of size 3 × 3, 4 × 4, 5 × 5 respectively, such that the absolute value of their eigenvalues are all distinct. Now for any n ≥ 3, we can build up an n × n adjacency matrix by combining copies of AG1 ,AG2 and AG3 along the diagonal blocks. By our choice, P (θ, ν/λ1) = 0 and P (θ, ν/λ2) = 0 do not have common solutions. Therefore, by the same reasoning as above, the measure of A2,λ1,λ2 is 0. This proves the claim. (b) By the similar argument as to (a) on the injectivity part (by considering the derivative of the characteristic polynomial at 0), with probability one , the 0-eigenspace V0 of AG is one ∼ dimensional (consisting of vectors with the same entry in every row). Hence V0 ⊗ H = H. The rest is a statement on the restriction of AG ⊗ A to the orthogonal complement of V0; and the proof is similar to (a). APPENDIX B PROOFS OF RESULTS IN SECTION V A. Proof of Theorem 1 S 0 Let W = v∈V {(v, xv,t)}1≤t≤kv . Suppose we form W by randomly choosing kv = kv(W ) points in {v} × Ω, denoted by {(v, yv,t)}1≤t≤kv . Let k be the dimension of V, which is not 35 P 0 more than v kv. We may re-order and re-label W = (wl)1≤l≤k = ((vl, xl))1≤l≤k and W = 0 0 (wl)1≤l≤k = ((vl, yl))1≤l≤k such that wl and wl belong to the same {vl} × Ω. P Suppose (gi)1≤i≤k forms a basis of V, where for each i = 1, . . . , k, gi = j ei,j ⊗ fi,j, n P ei,j ∈ C , and fi,j ∈ H are analytic for all j. Then each f ∈ V takes the form 1≤i≤k aigi. The coefficient system (ai)1≤i≤k is uniquely determined if and only if the k × k square 0 matrix MW 0 = (gi(wl))1≤i,l≤k is invertible, i.e., it has non-zero determinant. For W , let MW = (gi(wl))1≤i,l≤k. 0 0 P Consider wl = (vl, yl) and gi(wl) = j ei,j(vl)fi,j(yl). The factor ei,j(vl) is common for the (i, l)-th entries of both MW and MW 0 . Therefore, the determinant det(MW 0 ) is a polynomial in {fi,j(yl)}, and hence analytic in the variables {yl}. It is known that, by analyticity, the following k holds: the subset Y ⊂ Ω of (yl)1≤l≤k making det(MW 0 ) = 0 either has zero Lebesgue measure k or Y is the entire Ω . However, the existence of {xl}1≤l≤k (coming from {wl}1≤l≤k) shows that the latter case is not possible. As a consequence, Y has zero measure for any measure absolutely continuous w.r.t. the Lebesgue measure. B. Proof of Proposition 3 P Claim (a) follows directly from f = Φ0⊗Ξ0 Ff (φ ⊗ ξ) · φ ⊗ ξ for any f ∈ V. 0 We next show claims (b) and (c). Let V = {v1, . . . , vn}, 1 ≤ t ≤ δ(Φ ), and {1, . . . , n} = I ∪ ... ∪ I Φ0 1 t a union of disjoint subsets such that each Ij consists of linearly independent |I | ≥ |Φ0| j = 1, . . . , t Φ0 column vectors (cf. Definition 5). Therefore, j for all . Since Ij viewed as 0 0 a |Ij| × |Φ | matrix contains at least |Φ | linearly independent rows, we can choose row indices 0 il ∈ Ij, l = 1,..., |Φ |, corresponding to linearly independent rows. Let VIj = {vi1 , . . . , vi|Φ0| } ⊂ 0 V be the corresponding graph vertices. Then, for any x ∈ span(Ξ ), u ∈ VIj and f ∈ V, we have X f(u, x) = φ(u)f(φ, x), φ∈Φ0 0 and {f(φ, x): φ ∈ Φ } is uniquely determined by the values {f(u, x): u ∈ VIj }. Therefore, f(v, x) is uniquely determined for all v ∈ V since f ∈ V = span(Φ0 ⊗ Ξ0). Since |Ξ0| is finite, by a standard induction argument (identical to the proof that a n × k rank k matrix has k independent rows), we can find |Ξ0| points Ω0 ⊂ Ω such that for each v ∈ V and f(v, ·) ∈ span Ξ0, f(v, ·) is uniquely determined by its values at Ω0. For any partition 0 S Ω = 1≤j≤t Ωj into t disjoint subsets, we can construct W as follows: if i ∈ Ij, then W ∩ 36 ({vi} × Ω) = {vi} × Ωj. Then, for each x ∈ Ωj, v ∈ V and f ∈ V, we have shown above that 0 f(v, x) is uniquely determined. As Ω = ∪1≤j≤tΩj, f is uniquely determined. The claims (b) and (c) correspond to t = 1 and t = δ(Φ0) respectively, where in case (c), we 0 choose |Ωj| < |Ξ |/t + 1 for each j ≤ t. APPENDIX C SQUAREINTEGRABLEFUNCTIONSOFTHELINE In this section, we present analogous discussion for the case H = L2(R). In this case, the classical Fourier transform Z ∞ f(v, x) 7→ f\(v, ·)(ω) = f(v, x)e−iωxdx (13) −∞ for f(v, ·) ∈ H can no longer be viewed as a base change as the functions e−iωx are not square integrable (a unified approach requires the duality theory of locally compact abelian groups, which we would like to avoid). However, most parts of the theory can be developed in a parallel way. We will point out the differences in due course. Same as in Assumption 1, Φ is an orthonormal basis consisting of the eigenvectors of the graph shift operator. For f ∈ S(G, H), φ ∈ Φ and ω ∈ R, we may still define the joint F-transform as X Ff (φ, ω) = f\(v, ·)(ω)φ(v). v∈V and the partial F-transforms as Hf (ω)(v) = f\(v, ·)(ω), Gf (φ)(x) = hf(·, x), φiCn . so that F (φ, ω) = hH (ω)(·), φi = G\(φ)(ω). f f Cn f We can similarly define the frequency range of f as {(λφ, ω): Ff (φ, ω) 6= 0, φ ∈ Φ}, where λφ is the eigenvalue of φ. It is bandlimited if its frequency range is bounded. We can then define a band-pass filter by removing frequency components outside a designated frequency range. To define convolution filters, we can make use of the convolution operator on H. Suppose g ∈ S(G, H) is such that g(v) ∈ L2(R)∩L1(R) for each v ∈ V (e.g., if g is compactly supported). For f ∈ S(G, H), we define the convolution g ∗ f ∈ S(G, H) be such that Fg∗f (φ, ω) = Fg(φ, ω)Ff (φ, ω) 37 for φ ∈ Φ, ω ∈ R. A discrete subset W of V ×R is of sample rate r if for each connected interval I ⊂ R of size l, the shifted interval Ix = {x + y : y ∈ I} satisfy |(V × Ix) ∩ W | = rl for all x large enough. Suppose we have a closed subspace V of S(G, H), which is bandlimited. There is a bounded subset K ⊂ C × R such that the frequency range of each f ∈ V is contained in K. We say a countable discrete subset W ⊂ V × R with positive sample rate determines V if each f ∈ V is uniquely determined by its values at W as long as the values at W are square summable. 2 Let l (W ) be the space of square-summable sequences (xw)w∈W indexed by W . We give W the discrete topology and define the (evaluation) map 2 eW : V → l (W ) 2 such that the component of the sequence eW (f) ∈ l (W ) at the index w = (v, x) ∈ W is eW (f)w = f(v, x). Suppose W determines V. Then, eW is injective. For each w ∈ W , let χw be the characteristic sequence indexed by W that has value 1 at w and 0 at w0 ∈ W \{w}. 2 −1 The standard basis of l (W ) is given by (χw)w∈W . Let fw = eW (χw). We say that W has a 0 P rapidly vanishing standard basis if for every pair t, t ∈ R and v ∈ V , the sum w∈W |fw(v, t)− 0 2 fw(v, t )| < ∞. An example of a W with a rapidly vanishing standard basis is W consists of uniform samples at each vertex so that fw, w ∈ W , are uniform translates of the sinc function. We have the following version of the asynchronous sampling theorem, which says by perturbing a finite subset of W that determines V still determines V. Changing finitely many sample points does not change the sample rate. Theorem 2 (Asynchronous sampling for H = L2(R)). Suppose V ⊂ S(G, H) is a closed subspace that is bandlimited and W determines V with a rapidly vanishing standard basis. Let W1 be any finite subset of W . Let W2 be W such that each w ∈ W1 is replaced by r(w) such that both w and r(w) belong to {v} × R for the same graph node v ∈ V , and r(w) 6= r(w0) for 0 any w 6= w . Then, W2 determines V. Proof: By the construction of W2 from W , we have a bijection W → W2, w 7→ r(w) such −1 that w = r(w) for w∈ / W1. Let r(W1) = {r(w): w ∈ W1}. By composing eW2 with eW , we obtain a linear map: −1 2 2 ψ = eW2 ◦ eW : l (W ) → l (W2). 38 To prove the theorem, it suffices to show that ψ is invertible. −1 By assumption, {fw = eW (χw): w ∈ W } is rapidly vanishing. The image ψ(χw) = 0 0 (fw(w ))w ∈W2 is a series that is 0 except at r(W1) ∪ {r(w)}. It can be verified that: 2 X 2 ψ(χw) − χr(w) 2 = |fw(r(x)) − fw(x)| . x∈W1 We then have X 2 X X 2 ψ(χw) − χr(w) 2 = |fw(r(x)) − fw(x)| . w∈W w∈W x∈W1 By our assumption, for each pair x and r(x), there is v ∈ V such that x = (v, t) and r(x) = (v, t0) with t, t0 ∈ R. Therefore, we have X 2 X X 0 2 ψ(χw) − χr(w) 2 = |fw(v, t ) − fw(v, t)| w∈W w∈W (v,t)∈W1 X X 0 2 = |fw(v, t ) − fw(v, t)| . (14) (v,t)∈W1 w∈W Therefore, the right-hand side of (14) is finite as W1 is finite. As {χr(w) : r(w) ∈ W2} forms an 2 orthonormal basis of l (W2), by a result due to Paley and Wiener [19, Chapter 22.5, Theorem 7], 2 we know that {ψ(χw): w ∈ W } forms a basis of l (W2) and the proof is complete. REFERENCES [1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, May 2013. [2] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656, April 2013. [3] ——, “Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure,” IEEE Signal Process. Mag., vol. 31, no. 5, pp. 80–90, Sept 2014. [4] A. Gadde, A. Anis, and A. Ortega, “Active semi-supervised learning using sampling theory for graph signals,” in Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 492–501. [5] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning Laplacian matrix in smooth graph signal representations,” IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6160–6173, Dec 2016. [6] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from data under Laplacian and structural constraints,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 6, pp. 825–841, Sept 2017. [7] R. Shafipour, S. Segarra, A. G. Marques, and G. Mateos, “Network topology inference from non-stationary graph signals,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, March 2017, pp. 5870–5874. [8] F. Grassi, A. Loukas, N. Perraudin, and B. Ricaud, “A time-vertex signal processing framework: Scalable processing and meaningful representations for time-series on graphs,” IEEE Trans. Signal Process., vol. 66, no. 3, pp. 817–829, Feb 2018. 39 [9] A. Ortega, P. Frossard, J. Kovaceviˇ c,´ J. M. F. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, May 2018. [10] B. Girault, A. Ortega, and S. S. Narayanan, “Irregularity-aware graph fourier transforms,” IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5746–5761, Nov 2018. [11] J. Ito and K. Kaneko, “Spontaneous structure formation in a network of chaotic units with variable connection strengths,” Phys. Rev. Letts., vol. 88, no. 2, p. 028701, 2002. [12] A. Barrat, M. BarthÃl’lemy, and A. Vespignani, Dynamical Processes on Complex Networks. Cambridge University Press, 2008. [13] G. Zschaler, “Adaptive-network models of collective dynamics,” Eur. Phys. J. Spec. Top., vol. 211, pp. 1–101, 2012. [14] T. Gross and H. Sayama, Adaptive networks: Theory, Models and Applications, Understanding Complex Systems. Springer, 2009. [15] C. Shannon, “Communication in the presence of noise,” Proc. IRE, vol. 86, pp. 10–21, 1949. [16] F. Sivrikaya and B. Yener, “Time synchronization in sensor networks: a survey,” IEEE Netw., vol. 18, no. 4, pp. 45–50, 2004. [17] E. Olson, “A passive solution to the sensor synchronization problem,” in Proc. IEEE/RSJ Int. Conf. Intel. Robots. Syst., Taipei, 2010, pp. 1059–1064. [18] L. T. Bruscto, T. Heimfarth, and E. P. de Freitas, “Enhancing time synchronization support in wireless sensor networks,” Sensors, vol. 17, p. 2956, 2017. [19] P. Lax, Functional Analysis, 1st ed. Wiley-Interscience, 2002. [20] L. Debnath and P. Mikusinski, Introduction to Hilbert spaces with applications, 3rd ed. Burlington, MA: Elsevier Academic Press, 2005. [21] F. Ji and W. P. Tay. (2018) Generalized graph signal processing. [Online]. Available: goo.gl/hpqFFs [22] N. L. Carothers, A Short Course on Approximation Theory. Bowling Green State University, 2009. [23] T. Hungerford, Algebra (Graduate Texts in Mathematics) (v. 73), 8th ed. Springer, 2002. [24] R. Wang, Introduction to Orthogonal Transforms: With Applications in Data Processing and Analysis. Cambridge University Press, 2012. [25] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Inform. Process. Syst., USA, 2016, pp. 3844–3852. [26] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. [27] J. B. Conway, A Course in Functional Analysis, 2nd ed. New York, NY: Springer-Verlag, 1990. [28] R. A. Horn and C. R. Johnson, Matrix Analysis, 1st ed. Cambridge, UK: Cambridge University Press, 1990. [29] D. Shah and T. Zaman, “Rumors in a network: Who’s the culprit?” IEEE Trans. Inf. Theory, vol. 57, no. 8, pp. 5163–5181, 2011. [30] W. Luo, W. P. Tay, and M. Leng, “Identifying infection sources and regions in large networks,” IEEE Trans. Signal Process., vol. 61, no. 11, pp. 2850–2865, 2013. [31] W. Luo and W. P. Tay, “Finding an infection source under the SIS model,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Vancouver, Canada, May 2013, pp. 2930–2934. [32] K. Zhu and L. Ying, “Information source detection in the SIR model: A sample-path-based approach,” IEEE Trans. Networking, vol. 24, no. 1, pp. 408–421, Feb 2016. [33] F. Ji, W. P. Tay, and L. R. Varshney, “An algorithmic framework for estimating rumor sources with different start times,” IEEE Trans. Signal Process., vol. 65, no. 10, pp. 2517–2530, 2017. 40 [34] W. Tang, F. Ji, and W. P. Tay, “Estimating infection sources in networks using partial timestamps,” IEEE Trans. Inf. Forensics Security, vol. 13, no. 2, pp. 3035 – 3049, Dec. 2018. [35] Pratibha, J. Wang, S. Aggarwal, F. Ji, and W. P. Tay, “Learning correlation graph and anomalous employee behavior for insider threat detection,” in Proc. Int. Conf. on Information Fusion, Cambridge, UK, Jul. 2018.