<<

DIAGONALIZABLE SHIFT AND FILTERS FOR DIRECTED GRAPHS BASED ON THE JORDAN-CHEVALLEY DECOMPOSITION

Panagiotis Misiakos∗ Chris Wendler, Markus Püschel

Electrical and Computer Engineering Department of Computer Science NTU Athens, Greece ETH Zürich, Switzerland

the shift) are diagonalizable, have one-dimensional frequency responses, and Parseval’s theorem holds. Using these GSP ABSTRACT tools many applications for graph signals have been devel- Graph signal processing on directed graphs poses theoretical oped, e.g., for compression, sampling, denoising, label prop- challenges since an eigendecomposition of filters is in gen- agation, outlier detection and alias-free filtering [4, 6, 7, 8]. eral not available. Instead, Fourier analysis requires a Jor- In addition, graph convolutions are the foundation of graph dan decomposition and the frequency response is given by convolutional neural networks that have been applied to su- the , whose computation is numerically pervised [9] and semisupervised learning tasks [10]. unstable for large sizes. In this paper, we propose to replace a Directed graphs. Unfortunately, for directed graphs (di- given adjacency shift A by a diagonalizable shift AD obtained graphs), the GSP theory does not translate as well into prac- via the Jordan-Chevalley decomposition. This means, as we tice. The reason is that the Fourier basis, given by subspaces show, that AD generates the subalgebra of all diagonalizable that are invariant under filtering, is now determined by Jordan filters and is itself a polynomial in A (i.e., a filter). For several subspaces and the frequency response by the Jordan normal synthetic and real-world graphs, we show how AD adds and form. This results in various challenges for GSP theory and removes edges compared to A. applications including: Index Terms— graph signal processing, digraphs, Jordan 1. Frequency components are no longer one-dimensional. normal form, algebraic signal processing, diagonalizable fil- 2. The Fourier basis and transform are not unitary. ters 3. The computation of the Jordan decomposition is nu- merically unstable [11, 12]. 1. INTRODUCTION There have been various attempts to overcome these prob- lems. Reference [13] replaces the Jordan basis with the ba- There is a plethora of data that is, or can be viewed as, in- sis corresponding to the block-diagonal Schur factorization, dexed by the vertices of graphs. Examples include biological which factorizes a A into a block- networks, social networks, or communication networks such T = F AF −1. Similar to the Jordan basis, this basis de- as the internet [1, 2]. To bring signal processing (SP) tools composes the signal space into filtering invariant subspaces, to such graph data, fundamental SP concepts including shift, but, not necessarily the irreducible ones. Reference [14] filters, Fourier transform, and frequency response, have been introduces a Hermitian Laplacian operator based on a gen- generalized to the graph domain [3, 4] and build the founda- eralization of the Hermitian [15]. The tion of graph signal processing (GSP). There are two basic Hermitian Laplacian is as the name suggests Hermitian and, variants of GSP. The framework in [4] builds on algebraic by construction, captures the directions of the edges of the signal processing (ASP) [5] to derive these concepts from the underlying graph. The work in [16, 17] defines the directed definition of the shift, given by the adjacency matrix. In con- graph Fourier transform as the orthonormal basis with either trast, [3] defines the eigenbasis of the graph Laplacian as the minimal directed total variation or maximum spread, respec- graph Fourier basis. In ASP terms, it chooses the Laplacian tively. Further, [18] addresses the ambiguity in the choice of matrix as shift operator. Jordan base vectors and proposes a basis-free computation of Undirected graphs. Both approaches yield a satisfying spectral components. GSP framework for undirected graphs. Namely, since the Contributions. In this work, we stay within the GSP shift operator is symmetric, a unitary Fourier basis exists. As framework of [4] and make use of the Jordan-Chevalley de- a consequence, the shift, and thus all filters (polynomials in composition [19, 20] to derive a diagonalizable shift AD from ∗The first author conducted the research as a Summer Research Fellow at a given digraph shift (adjacency matrix) A. We show that AD ETH Zürich is a polynomial in A (i.e., a valid filter) and that it generates Copyright 2020 IEEE. Published in the IEEE 2020 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020), scheduled for 4-9 May, 2020, in Barcelona, Spain. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966. 0 0 2 0 0 1 0 0 1 0 0 0 0 0  0 0 2 0 0 1 0 the set of all diagonalizable filters. More precisely, the di- 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0  1 0 1 0 0 0      2  agonalizable polynomials in A are precisely the polynomials 1 0 0 0 0 0 0 0 0 0 0 0 0 0  1 0 0 0 0 0 0      1 1  0 0 0 0 0 1 1 0 0 0 0 0 0 0   2 0 1 0 0 2 0 A      1  in D. We present prototypical experiments with synthetic 0 0 1 0 0 1 1 0 0 0 0 0 0 0   0 2 0 0 1 0    √   2  and real-world graphs. They show that AD often differs by a 0 0 0 0 0 0 0 0 0 0 0 0 − 2 √0  0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 2 1 0 1 0 0 1 0 relatively small number of edges from A. This suggests that 2 −1 it might be possible to amend a graph given by A to AD to (a) A (b) J = F AF (c) AD = p(A) overcome the problems with Jordan bases. Fig. 1: (a) The adjacency matrix of our example, (b) the asso- ciated Jordan normal form of A, and (c) an associated diago- 2. SIGNAL PROCESSING ON DIRECTED GRAPHS nalizable shift derived in this paper. We briefly review graph signal processing for directed graphs (digraphs) as introduced in [4]. Let G be a weighted digraph n Each Sij denotes the subspace of C spanned by the Jordan with vertices V = {v1, . . . , vn}, edges E, and an adjacency chain corresponding to the j-th eigenvector for the i-th eigen- matrix A ∈ n×n containing the weights of the edges. C value λi. The geometric multiplicity gi is the number of such Graph signal. A graph signal on G is a signal indexed by chains, i.e., the dimension of the eigenspace for λi. Invariance its vertices means that for s ∈ Sij and H ∈ A, we have Hs ∈ Sij. s : V → C; v 7→ sv. (1) Frequency response. The frequency response of a fil- For mathematical convenience, we fix a ordering and ter H = h(A) captures its action on the pure frequencies (= T columns of F −1). Thus, it is given by write the signal as column vector s = (sv1 , . . . , svn ) . Graph shift. GSP [4] is an instantiation of the algebraic −1 signal processing theory [5] to graphs. Hence, convolution, FHF = h(J). (4) filters and Fourier transform are derived from the definition Example 2. The frequency response of the graph shift A in of a graph shift (that we also denote with A): Fig. 1a) is given by its JNF in Fig. 1b. n n A : C → C ; s 7→ As. (2) It is worth mentioning that the standard cyclic shift used for 3. DIAGONALIZABLE DIGRAPH FILTERS finite time series is the graph shift on the directed circle graph. Filters. The corresponding graph filters are linear, shift In this section we present our main contribution. For a given invariant mappings given by polynomials in A of the form digraph shift A, we constructively derive an associated diag- onalizable shift AD. The new shift AD is a polynomial in A k X (i.e., a filter) and generates the algebra of all diagonalizable H : n → n; s 7→ h Ais. (3) C C i filters. Further, as we see later in the experiments, if AD is i=0 again interpreted as graph, it often differs from A by only a Pk i small number of edges. The matrix associated with H is i=0 hiA , which implies shift-invariance: H(As) = AH(s). The set of all such fil- ters is closed under polynomial addition and multiplication 3.1. Diagonalizable Digraph Shift and thus forms an algebra A. The filter algebra A is isomor- We use the Jordan-Chevalley decomposition of algebras [19, phic to the polynomial algebra C[x]/mA(x), where mA(x) 20] imported to the GSP setting, i.e., algebras of the form denotes the minimal polynomial of A. We write the minimal C[x]/mA(x) generated by a matrix A. Qk di polynomial of A as mA(x) = i=1(x − λi) , where the λi denote A’s distinct eigenvalues and the di the associated Theorem 1. (Jordan-Chevalley Decomposition) A matrix lengths of their longest Jordan chains. A ∈ Cn×n can be uniquely decomposed into the sum of two n×n matrices A = AD + AN , with AD and AN ∈ C , that Example 1. The graph with adjacency√ matrix√A in Fig. 1a satisfy the following properties: has the minimal polynomial x3(x + 2)(x − 2). 1. AD is diagonalizable, Fourier transform. Let J = F AF −1 be the Jordan nor- 2. AN is nilpotent (i.e., a suitable power is 0), mal form (JNF) of A. Then F is the Fourier transform that de- 3. AD and AN commute, i.e., ADAN = AN AD, n 4. A and A are polynomials in A, i.e., A = p(A) composes the signal space C into a direct sum of the smallest D N D subspaces that are invariant under the shift and thus all filters: and AN = A − p(A).

k gi To prove Theorem 1, we need the following lemma about n M M the frequency response on a single Jordan subspace S . The F : C → Sij; s 7→ F s. ij i=1 j=1 lemma was already used implicitly in [4, App. B & C]. 1 2 1 2 for i ∈ {1, . . . , k}, in which ai is the multiplicity of λi in χA.

1 Then, p˜(x) ≡ p(x) mod mA(x). 7 2 3 7 1 2 2 3 2

1 2 1 2 Proof. By considering the Taylor expansion of p˜ at λi, mod- 2 1 di 6 4 6 2 4 ulo (x − λi) , and applying (6) and (5) we obtain 5 5 di−1 (l) (a) A (b) AD = A − AN X p˜ (λi) l p˜(x) ≡ p˜(λi) + (x − λi) l! Fig. 2: (a) with adjacency matrix A in Fig. 1a, l=1 (7) di−1 (l) (b) the graph corresponding to the diagonalizable AD. Edges X p (λi) l ≡ p(λi) + (x − λi) ≡ p(x) with weights =6 1 are labelled, new edges are red. l! l=1 Repeating this argument for all i ∈ {1, . . . , k} and apply- Lemma 1. (Polynomial of Jordan block) Let J (λ) be a d ing the Chinese remainder theorem yields the result. Jordan block of size d for eigenvalue λ. The polynomial h ∈ C[x]/mA(x) evaluated at Jd(λ) takes the form An alternative algorithm for the computation of the Jordan-Chevalley decomposition is proposed by [22].  h(1)(λ) h(d−1)(λ)  h(λ) 1! ··· (d−1)!  h(d−2)(λ)  Example 4. Solving (6) for our running example (Fig. 1)  0 h(λ) ··· (d−2)!  1 5 1 3 3 2 h(Jd(λ)) =   . yields p˜(x) = 4 x ≡ 2 x = p(x) mod x (x − 2). Thus,  . . . .   . . .. .  p˜(A) = p(A). 0 0 ··· h(λ) 3.2. Properties of the Diagonalizable Shift Proof. The result is obtained by considering monomials i h(x) = x , 0 ≤ i < deg(mA) and adding the results. We summarize important properties of AD. In particular, we show that AD generates all diagonalizable filters. Proof of Theorem 1. The existence of the desired decomposi- −1 tion of A = AD+AN follows from F AF = J = JD+JN , Theorem 2. (Properties of AD) Let AD be the diagonal- where JD contains all diagonal elements of J and JN the off- izable shift associated with A, given by Theorem 1, let −1 ∼ diagonal elements. Since p(A) = F p(J)F , we can apply D = C[x]/mAD (x) denote the polynomial algebra of fil- −1 Lemma 1 to each Jordan block to characterize the polynomial ters generated by AD, and, let F AF = J be the JNF of A. p ∈ [x]/mA(x) with p(A) = AD as the unique solution to Then, the following statements about AD hold: C −1 the Hermite interpolation problem [21, p. 120] P1. FADF is the diagonal part JD of J,

P2. mAD (x) = (x − λ1) ··· (x − λk) and (1) (di−1) −1 p(λi) = λi, p (λi) = 0, . . . , p (λi) = 0, (5) P3. D = {H ∈ A : FHF is diagonal}. for i ∈ {1, . . . , k}, in which λi denotes the i-th eigen- Proof. P1 holds by construction (see Theorem 1), and P2 fol- −1 value of A and di the size of its largest Jordan block. Thus, lows from mAD (AD) = F mAD (JD)F combined with P1. AD = p(A), AN = A − p(A) and ADAN = AN AD. It remains to prove P3: Obviously, each filter in D has a diagonal frequency response. Thus, we only need to show that Note that the computation of the diagonalizable shift each polynomial h ∈ C[x]/mA(x) with diagonal frequency AD = p(A) using Hermite interpolation only requires infor- ∼ response h(J) is an element of [x]/mA (x) = D. The mation about the minimal polynomial m of A. C D A mapping from a filter to its frequency response h 7→ h(J) Example 3. For our running example, A in Fig. 1a, p(x) = is a isomorphism, therefore, it suffices to show the existence 1 3 of a polynomial r ∈ [x]/m (x) with r(λ ) = h(λ ), for 2 x , and AD is given in Fig. 1c. The graphs associated with C AD i i A and AD are shown in Fig. 2. AD has additional edges i ∈ {1, . . . , k}. We have deg(r) ≤ k −1, thus, r is the unique shown in red. Lagrange interpolant for the k constraints [21, p. 119].

The characteristic polynomial χA(x) = det(A−xI) of A In particular, P3 means that, for an element of a Jordan may be easier to compute than the minimal polynomial (e.g., subspace s ∈ Sij and a filter H = h(AD) ∈ D, we have in Matlab). Thus, we provide an alternative construction of Hs = h(λi)s. In [8] such filters are referred to as alias-free. AD that we used in our experiments. 4. EXPERIMENTS Lemma 2. Let p ∈ C[x]/mA(x) be the solution of (5) and p˜ ∈ C[x]/χA(x) be the solution of the Hermite interpolation problem We compute and compare AD to A for several synthetic and real-world graphs. The basis question is how AD, again in- (1) (ai−1) p˜(λi) = λi, p˜ (λi) = 0,..., p˜ (λi) = 0, (6) terpreted as graph differs from the original graph given by A. (a) Erdös-Renyi (b) Pagerank (c) Barabasi-Albert (d) Stanford’s web (e) Wiki-Vote (f) Arxiv HEP-PH

Fig. 3: The first row contains a selection of adjacency matrices, i.e., graph shifts, A. The second row the associated diagonaliz- able shifts AD, again interpreted as graphs. Entries in AD (i.e., edges) that are not present in A are shown in red. Weights are not shown. (a–c) are synthetic graphs, (d–f) are sub-graphs of real-world graphs.

This comparison is done in Fig. 3 for various graphs. In each Stanford web: In the Stanford web graph vertices repre- case, entries (i.e., edges) in AD not present in A are marked sent pages from stanford.edu and directed edges represent hy- red. Weights in AD are not shown. perlinks between them. The subgraph in Fig. 3d has one Jor- Synthetic Graphs. We consider the following graph dan block of size 3, four of size 2, and the rest of size 1. models: Arxiv citations: We consider the the Arxiv-HEP-PH Erdös-Renyi: In Erdös-Renyi random graphs [23] each graph, which contains high energy physics phenomenology edge is sampled independently with equal probability. For papers as vertices. Directed edges correspond to citations. |V | = 60 and an edge creation probability of 0.07 about The subgraph in Fig. 3f has two Jordan blocks of size 10, one half of the randomly generated graphs are not diagonalizable. of size 6, two of size 4, one of size 3, four of size 2 and the Fig. 3a shows one example where A has one Jordan block of rest of size 1. size 3, four of size 2, and the rest of size 1. Summary. We observe that AD, if again interpreted as Pagerank: Pagerank graphs [24] were used by Google to graph, modifies a number of edges in A. This number de- obtain rankings of websites. Each website is a vertex and pends on the amount and size of the nontrivial Jordan blocks there is a weighted directed edge between two vertices if there of A and is often relatively small (except for Fig. 3c). This is a non-zero probability of users transitioning from the start observation is intuitive as AD = A if all Jordan blocks are of site to the target site. Fig. 3b shows one example with nine size 1, and, the larger the blocks, the larger is the impact of Jordan blocks of size 2 and the rest of size 1. the nilpotent part AN . Barabasi-Albert: The Barabasi-Albert model [25] generates scale-free graphs that mimic social net- 5. CONCLUSION works. This is achieved by successively growing a graph, where new nodes are more likely to connect to old nodes of The basic question underlying our work is how to have an high . For |V | = 60 and typical parameters [25], e.g., operational GSP framework for digraphs in the case that the m0 = 5, m = 5 or m0 = 10, m = 6, these graphs are almost adjacency matrix A is not diagonalizable. Our solution used never diagonalizable. Fig. 3c shows one example with one the Jordan-Chevalley decomposition to compute a diagonal- Jordan block of size 9, one of size 4, two of size 3, two of izable AD associated with A. Since AD is a polynomial in A size 2 and the rest of size 1. we stay within the framework of [4]. Further, our experiments Real-world Graphs. We consider the subgraphs corre- suggested that AD, if again interpreted as graph, is sometimes sponding to the first 60 vertices of graphs from the SNAP even similar to A, i.e., relatively few edges get modified. The dataset [1]. idea now is to replace GSP with A by GSP with AD. To Wikipedia adminship: In the Wikipedia adminship graph, show its viability several challenges remain including more users are represented as vertices. Users can vote for other exhaustive testing on graphs, scaling the computation of AD users to become admin, these votes are modeled as directed to large graphs in a numerical stable way, and comparing ex- edges. The subgraph in Fig. 3e has one Jordan block of size isting GSP applications when run with AD as shift instead 4, three of size 2 and the rest of size 1. of A. 6. REFERENCES [13] B. Girault, Signal processing on graphs-contributions to an emerging field, Ph.D. thesis, 2015. [1] J. Leskovec and A. Krevl, “SNAP datasets: Stanford large network dataset collection,” 2014. [14] S. Furutani, T. Shibahara, M. Akiyama, K. Hato, and M. Aida, “Graph signal processing for directed graphs [2] J. Kunegis, “Konect: the Koblenz network collection,” based on the Hermitian Laplacian,” in Proc. Euro- in Proc. International Conference on World Wide Web pean Conference on Machine Learning and Principles (WWW), 2013, pp. 1343–1350. and Practice of Knowledge Discovery in Databases (ECMLPKDD), 2019. [3] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal [15] J. Liu and X. Li, “Hermitian-adjacency matrices and processing on graphs: Extending high-dimensional data Hermitian energies of mixed graphs,” Linear Algebra analysis to networks and other irregular domains,” IEEE and its Applications, vol. 466, pp. 182–207, 2015. Trans. Signal Processing, vol. 30, no. 3, pp. 83–98, 2013. [16] S. Sardellitti, S. Barbarossa, and P. Di Lorenzo, “On the graph Fourier transform for directed graphs,” IEEE [4] A. Sandryhaila and J. M. F. Moura, “Discrete signal Journal of Selected Topics in Signal Processing, vol. 11, processing on graphs,” IEEE Trans. Signal Processing, no. 6, pp. 796–811, 2017. vol. 61, no. 7, pp. 1644–1656, 2013. [17] R. Shafipour, A. Khodabakhsh, G. Mateos, and [5] M. Püschel and J. M. F. Moura, “Algebraic signal E. Nikolova, “A directed graph Fourier transform with processing theory: Foundation and 1-D time,” IEEE spread frequency components,” IEEE Trans. Signal Pro- Trans. Signal Processing, vol. 56, no. 8, pp. 3572–3585, cessing, vol. 67, no. 4, pp. 946–960, 2018. 2008. [18] J. A. Deri and J. M. F. Moura, “Spectral projector-based [6] S. Chen, R. Varma, A. Sandryhaila, and J. Kovaceviˇ c,´ graph Fourier transforms,” IEEE Journal of Selected “Discrete signal processing on graphs: Sampling the- Topics in Signal Processing, vol. 11, no. 6, pp. 785–795, ory,” IEEE Trans. Signal Processing, vol. 63, no. 24, 2017. pp. 6510–6523, 2015. [19] C. Chevalley, “Théorie des groupes de Lie, Tome II: [7] A. Sandryhaila and J. M. F. Moura, “Discrete sig- groupes algébriques,” vol. 1303, 1951. nal processing on graphs: Frequency analysis,” IEEE Trans. Signal Processing, vol. 62, no. 12, pp. 3042– [20] L. Cagliero and F. Szechtman, “Jordan-Chevalley 3054, 2014. decomposition in finite dimensional Lie algebras,” Proc. American Mathematical Society, vol. 139, no. 11, [8] O. Teke and P. P. Vaidyanathan, “Extending classical pp. 3909–3913, 2011. multirate signal processing theory to graphs - part II: M- channel filter banks,” IEEE Trans. Signal Processing, [21] P. A. Fuhrmann, A Polynomial Approach to Linear Al- vol. 65, no. 2, pp. 423–437, 2016. gebra, Springer Science & Business Media, 2012.

[9] M. Defferrard, X. Bresson, and P. Vandergheynst, “Con- [22] D. Couty, J. Esterle, and R. Zarouf, “Décomposition volutional neural networks on graphs with fast localized effective de Jordan-Chevalley et ses retombées en en- spectral filtering,” in Advances in neural information seignement,” arXiv preprint arXiv:1103.5020, 2011. processing systems, 2016, pp. 3844–3852. [23] E. Paul and R. Alfréd, “On random graphs I,” Publica- tiones Mathematicae (Debrecen), vol. 6, pp. 290–297, [10] T. N. Kipf and M. Welling, “Semi-supervised classifica- 1959. tion with graph convolutional networks,” in Proc. In- ternational Conference on Learning Representations [24] L. Page, S. Brin, R. Motwani, and T. Winograd, “The (ICLR), 2017. citation ranking: Bringing order to the web.,” Tech. Rep., Stanford InfoLab, 1999. [11] T. Beelen and P. Van Dooren, “Computational aspects of the Jordan canonical form,” Reliable Numerical Com- [25] A.-L. Barabási and R. Albert, “Emergence of scaling putation, 1990. in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999. [12] Z.-N. Zhang and J.-N. Zhang, “On the computation of Jordan canonical form,” International Journal of Pure and Applied Mathematics, vol. 78, no. 2, pp. 155–160, 2012.