<<

IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020 639 Introducing Hypergraph Signal Processing: Theoretical Foundation and Practical Applications Songyang Zhang, Zhi Ding , Fellow, IEEE, and Shuguang Cui , Fellow, IEEE

Abstract—Signal processing over graphs has recently attracted be modeled as nodes while the physical interconnections or significant attention for dealing with the structured data. Normal social relationships among users are represented as edges [3]. graphs, however, only model pairwise relationships between Taking an advantage of graph models in characterizing the nodes and are not effective in representing and capturing some high-order relationships of data samples, which are common complex data structures, graph signal processing (GSP) has in many applications, such as Internet of Things (IoT). In this emerged as an exciting and promising new tool for processing article, we propose a new framework of hypergraph signal pro- the large data sets with complex structures. A typical appli- cessing (HGSP) based on the tensor representation to generalize cation of GSP is in image processing, where image pixels the traditional graph signal processing (GSP) to tackle high- are modeled as graph signals embedding in nodes while pair- order interactions. We introduce the core concepts of HGSP and define the hypergraph Fourier space. We then study the spec- wise similarities between pixels are captured by edges [6]. trum properties of hypergraph Fourier transform (HGFT) and By modeling the images using graphs, tasks, such as image explain its connection to mainstream digital signal processing. We segmentation can take advantage of and GSP derive the novel hypergraph sampling theory and present the filters. Another example of GSP applications is in processing fundamentals of hypergraph filter design based on the tensor data from the sensor networks [5]. Based on the graph models framework. We present HGSP-based methods for several sig- nal processing and data analysis applications. Our experimental directly built over the network structures, a graph Fourier space results demonstrate significant performance improvement using could be defined according to the eigenspace of a represent- our HGSP framework over some traditional signal processing ing graph matrix, such as the Laplacian or , solutions. to facilitate data processing operations, such as denoising [7], Index Terms—Data analysis, hypergraph, signal processing, filter banks [8], and compression [9]. tensor. Despite many demonstrated successes, the GSP defined over normal graphs also exhibits certain limitations. First, normal graphs cannot capture high-dimensional interactions describ- ing multilateral relationships among multiple nodes, which are I. INTRODUCTION critical for many practical applications. Since each edge in a RAPH-theoretic tools have recently found broad appli- normal graph only models the pairwise interactions between G cations in data science owing to their power to model the two nodes, the traditional GSP can only deal with the pair- complex relationships in the large structured data sets [1]. Big wise relationships defined by such edges. In reality, however, data, such as those representing interactions, complex relationships may exist among a cluster of nodes, Internet of Things (IoT) intelligence, biological connections, for which the use of pairwise links between every two nodes mobility, and traffic patterns, often exhibit complex structures cannot capture their multilateral interactions [16]. In biology, that are challenging to many traditional tools [2]. Thankfully, for example, a trait may be attributed to multiple interactive graphs provide good models for many such data sets as well genes [17] shown in Fig. 1(a), such that a quadrilateral as the underlying complex relationships. A data set with N interaction is more informative and powerful here. Another data points can be modeled as a graph of N vertices, whose example is the social network with online social communities internal relationships can be captured by edges. For example, called folksonomies, where trilateral interactions occur among subscribing users in a communication or social network can users, resources, and annotations [15], [40]. Second, a nor- mal graph can only capture a typical single-tier relationship Manuscript received August 12, 2019; revised October 2, 2019; accepted with the matrix representation. In complex systems and data October 21, 2019. Date of publication October 30, 2019; date of cur- sets, however, each node may have several traits such that rent version January 10, 2020. This work was supported in part by NSF there exist multiple tiers of interactions between two nodes. under Grant DMS-1622433 and Grant CNS-1824553. (Corresponding author: Shuguang Cui.) In a cyber-physical system, for example, each node usually S. Zhang and Z. Ding are with the Department of Electrical and Computer contains two components, i.e., the physical and Engineering, University of California at Davis, Davis, CA 95616 USA (e-mail: the cyber component, for which there exist two tiers of con- [email protected]; [email protected]). S. Cui was with the Department of Electrical and Computer Engineering, nections between a pair of nodes. Generally, such multitier University of California at Davis, Davis, CA 95616 USA. He is now with the relationships can be modeled as multilayer networks, where Shenzhen Research Institute of Big Data and Future Network of Intelligence each layer represents one tier of interactions [11]. However, Institute (FNii), the Chinese University of Hong Kong, Shenzhen 518172, China (e-mail: [email protected]). normal graphs cannot model the interlayer interactions sim- Digital Object Identifier 10.1109/JIOT.2019.2950213 ply, and the corresponding matrix representations are unable This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ 640 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

Fig. 1. Example of multilateral relationships. (a) Example of a trait and genes: the trait t1 is triggered by three genes v1, v2, v3, and the genes may also influence each other. (b) Example of pairwise relationships: arrow links represent the influences from genes, whereas the potential interactions among genes cannot be captured. (c) Example of multilateral relationships: the solid circular line represents a quadrilateral relationship among four nodes, while the purple dashed lines represent the potential internode interactions in this quadrilateral relationship.

in [4] is the lack of detailed analysis and application examples to demonstrate its practicability. In addition, the attempt in [4] to extend some key concepts from the traditional GSP sim- ply fails due to the difference in the basic setups between the graph signals and hypergraph signals. In this article, we seek to establish a more general and practical HGSP framework, capa- ble of handling arbitrary hypergraphs and naturally extending the traditional GSP concepts to handle the high-dimensional interactions. We will also provide a real application examples Fig. 2. Hypergraphs and applications. (a) Example of hypergraphs: the hyper- to validate the effectiveness of the proposed framework. edges are the overlaps covering nodes with different colors. (b) Game data set modeled by hypergraphs: each node is a specific game and each hyperedge Compared with the traditional GSP, a generalized HGSP is a of games. faces several technical challenges. The first problem lies in the mathematical representation of hypergraphs. Developing an algebraic representation of a hypergraph is the foundation to distinguish different tiers of relationships efficiently since of HGSP. Currently, there are two major approaches: matrix- they describe entries for all layers equivalently [10], [14]. based [32] and tensor-based [20]. The matrix-based method Thus, the traditional GSP based on the matrix analysis has makes it hard to implement the hypergraph signal shifting far been unable to efficiently handle such complex relation- while the tensor-based method is difficult to be understood ships. Clearly, there is a need for a more general graph model conceptually. Another challenge is in defining signal shifting and GSP concept to remedy the aforementioned shortcomings over the hyperedge. Signal shifting is easy to be defined as faced with the traditional GSP. propagation along the link direction of a simple edge connect- To find a more general model for complex data structures, ing the two nodes in a regular graph. However, each hyperedge we venture into the area of high-dimensional graphs known as in hypergraphs involves more than two nodes. How to model hypergraphs. The hypergraph theory is increasingly playing an signal interactions over a hyperedge requires careful considera- important role in and data analysis, especially tions. Other challenges include the definition and interpretation for analyzing high-dimensional data structures and interac- of hypergraph frequency. tions [18]. A hypergraph consists of nodes and hyperedges To address the aforementioned challenges and generalize the connecting more than two nodes [19]. As an example, Fig. 2(a) traditional GSP into a more general hypergraph tool to capture shows a hypergraph example with three hyperedges and seven high-dimensional interactions, we propose a novel tensor- nodes, whereas Fig. 2(b) provides a corresponding data set based HGSP framework in this article. The main contributions modeled by this hypergraph. Indeed, a normal graph is a spe- in this article can be summarized as follows. Representing cial case of a hypergraph, where each hyperedge degrades to hypergraphs as tensors, we define a specific form of hyper- a simple edge that only involves exactly two nodes. graph signals and hypergraph signal shifting. We then provide Hypergraphs have found successes by generalizing the nor- an alternative definition of hypergraph Fourier space based on mal graphs in many applications, such as clustering [39], clas- the orthogonal CANDECOMP/PARAFAC (CP) tensor decom- sification [22], and prediction [23]. Moreover, a hypergraph is position, together with the corresponding hypergraph Fourier an alternative representation for a multilayer network and is transform (HGFT). To better interpret the hypergraph Fourier useful when dealing with the multitier relationships [12], [13]. space, we analyze the resulting hypergraph frequency proper- Thus, a hypergraph is a natural extension of a normal graph in ties, including the concepts of frequency and bandlimited sig- modeling signals of high- interactions. Presently, how- nals. Analogous to the traditional sampling theory, we derive ever, the literature provides a little coverage on hypergraph the conditions and properties for perfect signal recovery from signal processing (HGSP). The only known work [4] proposed samples in HGSP. We also provide the theoretical foundation an HGSP framework based on a special hypergraph called for the HGSP filter designs. Beyond these, we provide several complexes. In this article [4], hypergraph signals are associ- application examples of the proposed HGSP framework. ated with each hyperedge, but its framework is limited to cell 1) We introduce a signal compression method based on the complexes, which cannot suitably model many real-world data new sampling theory to show the effectiveness of HGSP sets and applications. Another shortcoming of the framework in describing the structured signals. ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 641

2) We apply HGSP in to show how the HGSP spectrum space acts as a suitable spectrum for hypergraphs. 3) We introduce an HGSP method for binary classifica- tion problems to demonstrate the practical application Fig. 3. Signal shifting over cyclic graph. of HGSP in data analysis. 4) We introduce a filtering approach for the denoising Here, the matrix FM could be interpreted as a graph filter problem to further showcase the power of HGSP. whose functionality is to shift the signals along link directions. 5) Finally, we suggest several potential applicable back- Taking the cyclic graph shown in Fig. 3 as an example, its adjacency matrix is a shifting matrix grounds for HGSP, including IoT, social network, and ⎡ ⎤ natural language processing. 00··· 01 ⎢ ⎥ We compare the performance of HGSP-based methods with ⎢10··· 00⎥ the traditional GSP-based methods and learning algorithms ⎢. . . . .⎥ = ⎢...... ⎥. in all the above applications. All the features of HGSP make FM ⎢ ⎥ (2) ⎢ .. ⎥ it an essential tool for IoT applications in the future. ⎣00 . 00⎦ We organize the rest of this article as follows. Section II 00··· 10 first summarizes the preliminaries of the traditional GSP, ten- sors, and hypergraphs. In Section III, we then introduce the Typically, the shifted signal over the cyclic graph is calculated  = = ··· T core definitions of HGSP, including the hypergraph signal, the as s FMs [sN s1 sN−1] , which shifts the signal signal shifting, and the hypergraph Fourier space, followed by at each node to its next node. the frequency interpretation and description of existing works The graph spectrum space, also called the graph Fourier in Section IV. We present some useful HGSP-based results, space, is defined based on the eigenspace of FM. Assume that such as the sampling theory and filter design in Section V. the eigen-decomposition of FM is = −1 . With the proposed HGSP framework, we provide several FM VM VM (3) potential applications of HGSP and demonstrate its effective- ness in Section VI, before presenting the final conclusions in The frequency components are defined by the eigenvectors Section VII. of FM and the frequencies are defined with respect to the eigenvalues. The corresponding graph Fourier transform is defined as II. PRELIMINARIES sˆ = VMs. (4) A. Overview of Graph Signal Processing With the definition of the graph Fourier space, the traditional GSP is a recent tool used to analyze signals according to signal processing and learning tasks, such as denoising [33] the graph models. Here, we briefly review the key relevant and classification [73], could be solved within the GSP frame- concepts of the traditional GSP [1], [2]. work. More details about the specific topics of GSP, such as the A data set with N data points can be modeled as a normal frequency analysis, filter design, and spectrum representation graph G(V, E) consisting of a set of N nodes V ={v ,...,v } 1 N have been discussed in [5], [52], and [84]. and a set of edges E. Each node of the graph G is a data point, whereas the edges describe the pairwise inter- B. Introduction of Hypergraph actions between nodes. A graph signal represents the data associated with a node. For a graph with N nodes, there We begin with the definition of hypergraph and its possible are N graph signals, which are defined as a signal vector representations. T N H s = [s1 s2 ··· sN] ∈ R . Definition 1 (Hypergraph): A general hypergraph is a Usually, such a graph could be either described by an pair H = (V, E), where V ={v1,...,vN} is a set of N×N E ={ ,..., } adjacency matrix AM ∈ R where each entry indicates elements called vertices and e1 eK is a set of a pairwise link (or an edge), or by a Laplacian matrix nonempty multielement subsets of V called hyperedges. Let N×N = {| | ∈ E} LM = DM − AM where DM ∈ R is the diagonal matrix of M max ei : ei be the maximum cardinality of degrees. Both the Laplacian matrix and the adjacency matrix hyperedges, shorted as m.c.e(H) of H. can fully represent the graph structure. For convenience, we In a general hypergraph H, different hyperedges may con- N×N . . (H) use a general matrix FM ∈ R to represent either of them. tain different numbers of nodes. The m c e denotes the Note that since the adjacency matrix is eligible in both directed number of vertices in the largest hyperedge. An example of a and undirected graphs, it is more common in the GSP liter- hypergraph with seven nodes, three hyperedges, and m.c.e = 3 ature. Thus, the generalized GSP is based on the adjacency is shown in Fig. 4. matrix [2] and the representing matrix refers to the adjacency From the definition, we see that a normal graph is a spe- matrix in this article unless specified otherwise. cial case of a hypergraph if M = 2. The hypergraph is a With the graph representation FM and the signal vector s, natural extension of the normal graph to represent the high- the graph shifting is defined as dimensional interactions. To represent a hypergraph mathemat- ically, there are two major methods based on the matrix and  s = FMs. (1) tensor, respectively. In the matrix-based method, a hypergraph 642 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

× ×···× 2) A tensor T ∈ RI1 I2 IN is super-diagonal if its entries = = = ··· = ti1i2···iN 0 only if i1 i2 iN. For example, a third-order T ∈ RI×I×I is super-diagonal if its entries tiii = 0fori = 1, 2,...,I, while all other entries are zero. 2) Tensor Operations: Tensor analysis is developed based on tensor operations. Some tensor operations are commonly used in our HGSP framework [48]Ð[50]. Fig. 4. Hypergraph H with seven nodes, three hyperedges, and m.c.e(H) = 3, 1) The tensor outer product between a Pth-order tensor where V ={v ,...,v } and E ={e , e , e }. Three hyperedges are e = × ×···× 1 7 1 2 3 1 ∈ RI1 I2 IP {v , v , v }, e ={v , v },ande ={v , v , v }. U with entries ui1···iP and a Qth-order 1 4 6 2 2 3 3 5 6 7 × ×···× ∈ RJ1 J2 JQ tensor V with entries vj1···jQ is denoted × ×···× × × ×···× by W = U◦V.TheresultW ∈ RI1 I2 IP J1 J2 JQ is represented by a matrix G ∈ RN×E, where E equals the is a (P + Q)th-order tensor, whose entries are calcu- number of hyperedges. The rows of the matrix represent the lated by nodes, and the columns represent the hyperedges [19]. Thus, wi ···i j ···j = ui ···i · vj ···j . (6) each element in the matrix indicates whether the correspond- 1 P 1 Q 1 P 1 Q ing node is involved in the particular hyperedge. Although The major use of the tensor outer product is to con- such a matrix-based representation is simple in formation, it struct a higher order tensor with several lower order is hard to define and implement the signal processing directly tensors. For example, the tensor outer product between ∈ RM ∈ RN as in GSP by using the matrix G. Unlike the matrix-based the vectors a and b is denoted by method, tensor has better flexibility in describing the struc- T = a ◦ b (7) tures of the high-dimensional graphs [42]. More specifically, where the result T isamatrixinRM×N with entries tensor can be viewed as an extension of matrix into high- t = a · b for i = 1, 2,...,M and j = 1, 2,...,N. dimensional domains. The adjacency tensor, which indicates ij i j Now, we introduce one more vector c ∈ RQ, where whether nodes are connected, is a natural hypergraph counter- part to the adjacency matrix in the normal graph theory [51]. S = a ◦ b ◦ c = T ◦ c. (8) Thus, we prefer to represent the hypergraphs using tensors. In Here, the result S is a third-order tensor with entries Section III-A, we will provide more details on how to represent sijk = ai · bj · ck = tij · ck for i = 1, 2,...,M, j = the hypergraphs and signals in tensor forms. 1, 2,...,N and k = 1, 2,...,Q. × ×···× 2) The n-mode product between a tensor U ∈ RI1 I2 IP J×I C. Tensor Basics and a matrix V ∈ R n is denoted by W = U ×n V ∈ × ×···× × × ×···× Before we introduce our tensor-based HGSP framework, let RI1 I2 In−1 J In+1 IP . Each element in W is us introduce some tensor basics to be used later. Tensors can defined as effectively represent high-dimensional graphs [14]. Generally In = speaking, tensors can be interpreted as multidimensional wi1i2···in−1jin+1···iP ui1···iP vjin (9) arrays. The order of a tensor is the number of indices needed in=1 to label a component of that array [24]. For example, a third- where the main function is to adjust the dimension of a order tensor has three indices. In fact, scalars, vectors, and specific order. For example, in (9), the dimension of the matrices are all special cases of tensors: a scalar is a zeroth- nth-order of U is changed from In to J. order tensor; a vector is a first-order tensor; a matrix is a 3) The Kronecker product of matrices U ∈ RI×J and V ∈ RP×Q second-order tensor; and an M-dimensional array is an Mth- is defined⎡ as ⎤ order tensor [10]. Generalizing a 2-D matrix, we represent the u11V u12V ··· u1JV entry at the position (i1, i2,...,iM) of an Mth-order tensor ⎢ ··· ⎥ × ×···× ⎢u21V u22V u2JV⎥ T ∈ RI1 I2 IM by t ··· in the rest of this article. U ⊗ V = ⎢ . . . . ⎥ (10a) i1i2 iM ⎣ . . .. . ⎦ Below are some useful definitions and operations of tensor . . . ··· related to the proposed HGSP framework. uI1V uI2V uIJV 1) Symmetric and Diagonal Tensors: to generate an IP × JQ matrix. 1) A tensor is super-symmetric if its entries are invariant 4) The KhatriÐRao product between U ∈ RI×K and V ∈ under any of their indices [44]. For exam- RJ×K is defined as ple, a third-order T ∈ RI×I×I is super-symmetric if its U  V = [u1 ⊗ v1 u2 ⊗ v2 ··· uK ⊗ vK]. (11) entries tijk’s satisfy 5) The Hadamard product between U ∈ RP×Q and V ∈ t = t = t = t = t = t i, j, k = 1,...,I. RP×Q ijk jik kij kji jik jki is defined⎡ as ⎤ (5) u11v11 u12v12 ··· u1Qv1Q ⎢ ⎥ ⎢u21v21 u22v22 ··· u2Qv2Q ⎥ Analysis of super-symmetric tensors, which is shown ∗ = . U V ⎢ . . .. . ⎥ (12) to be bijectively related to homogeneous polynomials, ⎣ . . . . ⎦ could be found in [45] and [46]. uP1vP1 uP2vP2 ··· uPQvPQ ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 643

helpful in developing a novel HGSP framework. As we men- tioned in Section II-C, tensor is an intuitive representation for high-dimensional graphs. In this section, we introduce the algebraic representation of hypergraphs based on tensors. Similar to an adjacency matrix whose 2-D entries indi- cate whether and how two nodes are pairwise connected by Fig. 5. CP decomposition of a third-order tensor. a simple edge, we adopt an adjacency tensor whose entries indicate whether and how corresponding subsets of M nodes 3) Tensor Decomposition: Similar to the eigen- are connected by hyperedges to describe hypergraphs [20]. decomposition for matrix, tensor decomposition analyzes Definition 2 (Adjacency Tensor): A hypergraph H = tensors via factorization. The CP decomposition is a widely (V, E) with N nodes and m.c.e(H) = M can be repre- sented by an Mth-order N-dimension adjacency tensor A ∈ used method, which factorizes a tensor into a sum of compo- × ×···× nent rank-one tensors [24], [47]. For example, a third-order N N N tensor T ∈ RI×J×K is decomposed into R M times defined as  R = , ≤ , ,..., ≤ . A ai1i2···iM 1 i1 i2 iM N (15) T = ar ◦ br ◦ cr (13) = r 1 Suppose that el ={vl1, vl2,...,vlc}∈E is a hyperedge in I J K H ≤ where ar ∈ R , br ∈ R , cr ∈ R , and R is a positive inte- with the number of vertices c M. Then, el is represented ger known as rank, which leads to the smallest number of by all the elements ap1···pM ’s in A, where a subset of c indices rank-one tensors in the decomposition. The process of CP from {p1, p2,...,pM} are exactly the same as {l1, l2,...,lc} decomposition for a third-order tensor is illustrated in Fig. 5. and the other M−c indices are picked from {l1, l2,...,lc} ran- There are several extensions and alternatives of the CP domly. More specifically, these elements ap1···pM ’s describing decomposition. For example, the orthogonal-CP decomposi- el are calculated as ⎛ ⎞ tion [26] decomposes the tensor using an orthogonal basis. For −1 N × N ×···×N  ⎝ M! ⎠ ap ···p = c . (16) an Mth-order N-dimension tensor T ∈ R M times ,it 1 M  k !k ! ···k ! , ,..., ≥ , c = 1 2 c can be decomposed by the orthogonal-CP decomposition as k1 k2 kc 1 i=1 ki M R Meanwhile, the entries, which do not correspond to any = λ · (1) ◦···◦ (M) T r ar ar (14) hyperedge e ∈ E, are zeros. r=1 Note that (16) enumerates all the possible combinations ( ) { ,..., } where λ ≥ 0 and the orthogonal basis is a i ∈ RN for 1 ≤ of c positive integers k1 kc , whose summation satisfies r r c = i ≤ M. More specifically, the orthogonal-CP decomposition i=1 ki M. Obviously, when the hypergraph degrades to = = has a similar form to the eigen-decomposition when M = 2 the normal graph with c M 2, the weights of edges are = = = ( , ) ∈ E and T is super-symmetric. calculated as one, i.e., aij aji 1 for an edge e i j . 4) Tensor Spectrum: The eigenvalues and spectral space Then, the adjacency tensor is the same as the adjacency matrix. of tensors are significant topics in the tensor algebra. The To understand the physical meaning of the adjacency tensor research of the tensor spectrum has achieved great progress and its weight, we start with the M-uniform hypergraph with in recent years. It will take a large volume to cover all the N nodes, where each hyperedge has exactly M nodes [53]. properties of the tensor spectrum. Here, we just list some help- Since each hyperedge has an equal number of nodes, all ful and relevant literature. In particular, Lim [86] developed hyperedges follow a consistent form to describe an M-lateral . . = the theories of eigenvalues, eigenvectors, singular values, and relationship with m c e M. Obviously, such M-lateral rela- singular vectors for tensors based on a constrained varia- tionships can be represented by an Mth-order tensor A, where the entry a ··· indicates whether the nodes v , v ,...,v tional approach, such as the Rayleigh quotient. Qi [25], [85] i1i2 iM i1 i2 iM = presented a more complete discussion of tensor eigenvalues by are in the same hyperedge, i.e., whether a hyperedge e {v , v ,...,v } exists. If the weight is nonzero, the hyper- defining two forms of tensor eigenvalues, i.e., the E-eigenvalue i1 i2 iM and the H-eigenvalue. Chang et al. [44] further extended the edge exists; otherwise, the hyperedge does not exist. Taking work of [25] and [85]. Other works, including [87] and [88], the 3-uniform hypergraph in Fig. 6(a) as an example, the = = = further developed the theory of tensor spectrum. hyperedge e1 is characterized by a146 a164 a461 a416 = a614 = a641 = 0, the hyperedge e2 is characterized by a = a = a = a = a = a = 0, and e is III. DEFINITIONS FOR HYPERGRAPH SIGNAL PROCESSING 237 327 732 723 273 372 3 represented by a567 = a576 = a657 = a675 = a756 = a765 = 0. In this section, we introduce the core definitions used in our All other entries in A are zero. Note that all the hyperedges HGSP framework. in an M-uniform hypergraph has the same weight. Different hyperedges are distinguished by the indices of the entries. A. Algebraic Representation of Hypergraphs More specifically, similar to aij in the adjacency matrix implies The traditional GSP mainly relies on the representing matrix the connection direction from node vj to node vi in GSP, of a graph. Thus, an effective algebraic representation is also an entry ai1i2···iM characterizes one direction of the hyperedge 644 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

Fig. 6. Examples of hypergraphs. (a) Three-uniform hypergraph H with seven nodes, three hyperedges, and m.c.e(H) = 3, where V ={v1,...,v7} and E ={e1, e2, e3}. Three hyperedges are e1 ={v1, v4, v6}, e2 ={v2, v3, v7},ande3 ={v5, v6, v7}. (b) General hypergraph H with seven nodes, three hyperedges, and m.c.e(H) = 3, where V ={v1,...,v7} and E ={e1, e2, e3}. Three hyperedges are e1 ={v1, v4, v6}, e2 ={v2, v3},ande3 ={v5, v6, v7}.

the edgeweight w(ej) of ej in the adjacency tensor A, i.e., w(ei) = w(ej),ifI = J. Moreover, w(ei) = w(ej) iff I = J. This property can help identify the length of each hyper- edge based on the weights in the adjacency tensor. Moreover, the edgeweights of two hyperedges with the same number of nodes are the same. Different hyperedges with the same num- ber of nodes are distinguished by their indices of entries in an = Fig. 7. Interpretation of generalizing e2 to hyperedges with M 3. adjacency tensor. The degree d(vi), of a vi ∈ V, is the number of hyperedges containing vi, i.e., ={ , ,..., } N e vi1 vi2 viM with node viM as the source and node d(vi) = aij j ···j − . (17) vi as the destination. 1 2 M 1 1 , ,..., = However, for a general hypergraph, different hyperedges j1 j2 jM−1 1 may contain different numbers of nodes. For example, in the Then, the Laplacian tensor of the hypergraph H is defined hypergraph of Fig. 6(b), the hyperedge e2 only contains two as follows [20]. nodes. How to represent the hyperedges with the number of Definition 3 (Laplacian Tensor): Given a hypergraph nodes below m.c.e = M may become an issue. To represent H = (V, E) with N nodes and m.c.e(H) = M, the Laplacian ={ , ,..., }∈E tensor is defined as such hyperedge el vl1 vl2 vlc with the num- ber of vertices c < M in an Mth-order tensor, we can use N × N ×···× N ··· entries ai1i2 iM , where a subset of c indices are the same as L = D − A ∈ R M times (18) {l1,...,lc} (possibly a different order) and the other M − c which is an Mth-order N-dimension tensor. Here, D = indices are picked from {l1,...,lc} randomly. This process ( ··· ) can be interpreted as generalizing the hyperedge with c nodes di1i2 iM is also an Mth-order N-dimension super-diagonal = ( ) to a hyperedge with M nodes by duplicating M−c nodes from tensor with nonzero elements of dii ··· i d vi . { ,..., } the set vl1 vlc randomly with possible repetitions. For M times example, the hyperedge e ={v , v } in Fig. 6(b) can be repre- We see that both the adjacency and Laplacian tensors 2 2 3 H sented by the entries a = a = a = a = a = a of a hypergraph are super-symmetric. Moreover, when 233 323 332 322 223 232 . . (H) = in the third-order tensor A, which could be interpreted as gen- m c e 2, they have similar forms to the adjacency and eralizing the original hyperedge with c = 2 to hyperedges Laplacian matrices of undirected graphs, respectively. Similar with M = 3 nodes as Fig. 7. We can use (16) as a generaliza- to GSP, we use an Mth-order N-dimension tensor F as a gen- H tion coefficient of each hyperedge with respect to permutation eral representation of a given hypergraph for convenience. and combination [20]. More specifically, for the adjacency As the adjacency tensor is more general, the representing tensor of the hypergraph in Fig. 6(b), the entries are calcu- tensor F refers to the adjacency tensor in this article unless specified otherwise. lated as a146 = a164 = a461 = a416 = a614 = a641 = a567 = a576 = a657 = a675 = a756 = a765 = (1/2) and B. Hypergraph Signal and Signal Shifting a233 = a323 = a332 = a322 = a223 = a232 = (1/3), where the remaining entries are set to zeros. Note that the weight is Based on the tensor representation of hypergraphs, we now smaller if the original hyperedge has fewer nodes in Fig. 6(b). provide definitions for the hypergraph signal. In the traditional More generally, based on the definition of adjacency tensor GSP, each signal element is related to one node in the graph. and (16), we can easily obtain the following property regarding Thus, the graph signal in GSP is defined as an N-length vector the hyperedge weight. if there are N nodes in the graph. Recall that the representing Property 1: Given two hyperedges ei ={v1,...,vI} and matrix of a normal graph can be treated as a graph filter, for ej ={v1,...,vJ}, the edgeweight w(ei) of ei is different from which the basic form of the filtered signal is defined in (1). ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 645

Fig. 8. Signals in a polynomial filter.

Thus, we could extend the definitions of the graph signal and Fig. 9. Diagram of hypergraph shifting. signal shifting from the traditional GSP to HGSP based on the tensor-based filter implementation. N  In HGSP, we also relate signal element to one node in the vector s ∈ R , i.e., s = FMs. Similarly, we define the hyper- hypergraph. Naturally, we can define the original signal as an graph signal shifting based on its tensor F and the hypergraph N-length vector if there are N nodes. Similarly, as in GSP, signal s[M−1]. we define the hypergraph shifting based on the representing Definition 6 (Hypergraph Shifting): The basic shifting fil- tensor F. However, since tensor F is of Mth-order, we need ter of hypergraph signals is defined as the direct contraction an (M − 1)th-order signal tensor to work with the hypergraph between the representing tensor F and the hypergraph signals filter F, such that the filtered signal is also an N-length vector s[M−1], i.e., as the original signal. For example, for a two-step polynomial [M−1]   s( ) = Fs (20) filter shown as Fig. 8, the signals s, s , s should all be in the 1 same dimension and order. For the input and output signals where each element of the filter output is given by in an HGSP system to have a consistent form, we define an  N alternative form of the hypergraph signal as below. s(1) = fij ···j − sj sj ···sj − . (21) Definition 4 (Hypergraph Signal): For a hypergraph H i 1 M 1 1 2 M 1 j1,...,jM−1=1 with N nodes and m.c.e(H) = M, an alternative form of hyper- graph signal is an (M − 1)th-order N-dimension tensor s[M−1] Since the hypergraph signal contracts with the representing − obtained from (M − 1) times outer product of the original tensor in M 1 order, the one-time filtered signal s(1) is an T N-length vector, which has the same dimension as the original signal s = [s1 s2 ··· sN] , i.e., signal. Thus, the block diagram of a hypergraph filter with F can be shown as Fig. 9. [M−1] s = s ◦···◦ s (19) Let us now consider the functionality of the hypergraph fil- M−1 times ter, as well as the physical insight of the hypergraph shifting. In GSP, the functionality of the filter FM is simply to shift the signals along the link directions. However, interactions where each entry in position (i1, i2,...,iM−1) equals the ··· inside the hyperedge are more complex as it involves more product si1 si2 siM−1 . Note that the above hypergraph signal comes from the orig- than two nodes. In (21), we see that the filtered signal in vi inal signal. They are different forms of the same signal, which equals the summation of the shifted signal components in all ··· reflect the signal properties in different dimensions. For exam- hyperedges containing node vi, where fij1 jM−1 is the weight { ,..., } ple, a second-order hypergraph signal highlights the properties for each involved hyperedge and sj1 sjM−1 are the sig- nals in the generalized hyperedges excluding si. Clearly, the of the 2-D signal components sisj while the original signal directly emphasizes more about the one-dimension proper- hypergraph shifting multiplies signals in the same hyperedge ties. We will discuss in detail on the relationship between the of node vi together before delivering the shift to a certain hypergraph signal and the original signal in Section III-D. node vi. Taking the hypergraph in Fig. 6(a) as an example, ={ , , } With the definition of hypergraph signals, let us define the node v7 is included in two hyperedges, e2 v2 v3 v7 and ={ , , } original domain of signals for convenience before we step e3 v5 v6 v7 . According to (21), the shifted signal in node into the signal shifting. Similarly as that the signals lie in v7 is calculated as the time domain for DSP, we have the following definition of s7 = f732 × s2s3 + f723 × s2s3 + f756 × s5s6 + f765 × s5s6 the hypergraph vertex domain. (22) Definition 5 (Hypergraph Vertex Domain): A signal lies in the hypergraph vertex domain if it resides on the structure of where f732 = f723 is the weight of the hyperedge e2 and a hypergraph in the HGSP framework. f756 = f765 is the weight of the hyperedge e3 in the adjacency The hypergraph vertex domain is a counterpart of time domain tensor F. in HGSP. The signals are analyzed based on the structure As the entry aji in the adjacency matrix of a normal graph among vertices in a hypergraph. indicates the link direction from the node vi to the node vj,

Next, we discuss how the signals shift on the given hyper- the entry fi1···iM in the adjacency tensor similarly indicates the { , ,..., } graph. Recall that in GSP, the signal shifting is defined by the order of nodes in a hyperedge as viM viM−1 vi1 , where ∈ RN×N product of the representing matrix FM and the signal vi1 is the destination and viM is the source. Thus, the shifting 646 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

the orthogonal-CP decomposition [26] to write R = λ · (1) ◦···◦ (M) F r fr fr (23) r=1 (i) N with basis fr ∈ R for 1 ≤ i ≤ M and λr ≥ 0. Since F is (1) (2) (M) super-symmetric [25], i.e., fr = fr = fr = ··· = fr ,we have R = λ · ◦···◦ . F r fr f r (24) = r 1 M times Generally, we have the rank R ≤ N in a hypergraph. We will discuss how to construct the remaining fi, R < i ≤ N,forthe case of R < N later in Section III-F. Now, by plugging (24) into (20), the hypergraph shifting can be written with the N basis fi’s as = [M−1] s(1) Fs⎛ ⎞⎛ ⎞ (25a) N = ⎝ λ · ◦···◦ ⎠⎝ ◦···◦ ⎠ r fr f r s s (25b) = r 1 M times M−1 times N = λ < , > ···< , > rfr fr s fr s (25c) r=1 M−⎡1 times ⎤ λ   1 ⎢ .. ⎥ = f1 ··· fN ⎣ . ⎦

λN Fig. 10. Diagram of signal shifting. (a) Example of signal shifting to node × v7. Different colors of arrows show the different directions of shifting; “ ” iHGFT⎡ and filter in Fourier⎤ space refers to multiplication and “+” refers to summation. (b) Diagram of signal  − fT s M 1 shifting to node v1 in an M-way hyperedge. Different colors refer to different ⎢ 1 ⎥ shifting directions. ⎢ . ⎥ × ⎣ . ⎦ (25d)  − fT s M 1 N by (22) could be interpreted as shown in Fig. 10(a). Since HGFT of the hypergraph signal { , } < , > = ( T ) there are two possible directions from nodes v2 v3 to node where fr s fi s is the inner product between fr and s, M−1 v7 in e2, there are two components shifted to v7, i.e., the first and (·) is (M − 1)th power. two terms in (22). Similarly, there are also two components From (25d), we see that the shifted signal in HGSP is in a shifted by the hyperedge e3, i.e., the last two terms in (22). To similar decomposed to (3) and (4) for GSP. The first two parts −1 illustrate the hypergraph shifting more explicitly, Fig. 10(b) in (25d) work like VM of the GSP eigen-decomposition, shows a diagram of signal shifting to a certain node in an which could be interpreted as inverse Fourier transform and M-way hyperedge. From Fig. 10(b), we see that the graph filter in the Fourier space. The third part can be understood shifting in GSP is a special case of the hypergraph shifting, as the HGFT of the original signal. Hence, similarly as in where M = 2. Moreover, there are K = (M − 1)! possible GSP, we can define the hypergraph Fourier space and Fourier directions for the shifting to one specific node in an M-way transform based on the orthogonal-CP decomposition of F. hyperedge. Definition 7 (Hypergraph Fourier Space and Fourier Transform): The hypergraph Fourier space of a given hyper- graph H is defined as the space consisting of all orthogonal- CP decomposition basis {f1, f2,...,fN}. The frequencies C. Hypergraph Spectrum Space are defined with respect to the eigenvalue coefficients λi, We now provide the definitions of the hypergraph Fourier 1 ≤ i ≤ N. The HGFT of hypergraph signals is defined as   space, i.e., the hypergraph spectrum space. In GSP, the graph [M−1] sˆ = FC s (26a) Fourier space is defined as the eigenspace of its represent- ⎡ ⎤  − ing matrix [5]. Similarly, we define the Fourier space of fT s M 1 ⎢ 1 ⎥ HGSP based on the representing tensor F of a hypergraph, ⎢ . ⎥ = ⎣ . ⎦. (26b) which characterizes the hypergraph structure and signal shift-  T M−1 ing. For an Mth-order N-dimension tensor F, we can apply fNs ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 647

Fig. 11. Diagram of (a) HGFT and (b) iHGFT.

Compared to GSP, if M = 2, the HGFT has the same form Property 3: The hypergraph signal is the M−1 times tensor as the traditional GFT. In addition, since fr is the orthogonal outer product of the original signal in the hypergraph vertex basis, we have domain, and the M−1 times Hadamard product of the original   − M−1 signal in the hypergraph frequency domain. Ff[M 1] = λ f fT f = λ f . (27) r i i i r r r Then, we could establish a connection between the orig- According to [25], a vector x is an E-eigenvector of an Mth- inal signal and the hypergraph signal in the hypergraph order tensor A,ifAx[M−1] = λx exists for a constant λ. Then, Fourier domain by the HGFT and inverse HGFT (iHGFT) we obtain the following property of the hypergraph spectrum. as shown in Fig. 11. Such a relationship leads to some Property 2: The hypergraph spectrum pair (λr, fr) is an interesting properties and makes the HGFT implementation E-eigenpair of the representing tensor F. more straightforward, which will be further discussed in Recall that the spectrum space of GSP is the eigenspace of Sections III-F and III-G, respectively. the representing matrix FM. Property 2 shows that HGSP has a consistent definition in the spectrum space as that for GSP. E. Hypergraph Frequency As we now have a better understanding of the hypergraph D. Relationship Between Hypergraph Signal and Fourier space and Fourier transform, we can discuss more Original Signal about the hypergraph frequency and its order. In GSP, the With HGFT defined, let us discuss more about the relation- graph frequency is defined with respect to the eigenvalues of ship between the hypergraph signal and the original signal in the representing matrix FM and ordered by the total varia- the Fourier space to understand the HGFT better. From (26b), tion [5]. Similarly, in HGSP, we define the frequency relative the hypergraph signal in the Fourier space is written as to the coefficients λ from the orthogonal-CP decomposition. ⎡  ⎤ i M−1 We order them by the total variation of frequency components fT s ⎢ 1 ⎥ ⎢ . ⎥ fi over the hypergraph. The total variation of a general signal sˆ = ⎣ . ⎦ (28)  component over a hypergraph is defined as follows. T M−1 Definition 8 (Total Variation Over Hypergraph): Given a fNs hypergraph H with N nodes and the normalized representing which can be further decomposed as norm ⎛⎡ ⎤ ⎞ ⎛⎡ ⎤ ⎞ ⎛⎡ ⎤ ⎞ tensor F = (1/λmax)F, together with the original signal fT fT fT s, the total variation over the hypergraph is defined as the ⎜⎢ .1 ⎥ ⎟ ⎜⎢ .1 ⎥ ⎟ ⎜⎢ .1 ⎥ ⎟ sˆ = ⎝⎣ . ⎦s⎠ ∗ ⎝⎣ . ⎦s⎠ ∗···∗⎝⎣ . ⎦s⎠ (29) total differences between the nodes and their corresponding neighbors in the perspective of shifting, i.e., fT fT fT N N N N N M−1 times   ( ) = − norm ··· TV s si F ··· sj sj − (31a) where ∗ denotes Hadamard product. ij1 jM−1 1 M 1 i=1 j ,...,j − =1 From (29), we see that the hypergraph signal in the hyper- 1 M 1 − graph Fourier space is M − 1 times Hadamard product of a = s − Fnorms[M 1] . (31b) 1 component consisting of the hypergraph Fourier basis and the original signal. More specifically, this component works as We adopt the l1-norm here only as an example of defin- the original signal in the hypergraph Fourier space, which is ing the total variation. Other norms may be more suitable defined as depending on specific applications. Now, with the definition of total variation over hypergraphs, the frequency in HGSP is ˜ = s Vs (30) ordered by the total variation of the corresponding frequency T T where V = [f1 f2 ··· fN] and V V = I. component fr, i.e., Recall the definitions of the hypergraph signal and vertex ( ) = − norm TV fr fr fr(1) (32) domain in Section III-B, we have the following property. 1 648 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

norm λ where fr(1) is the output of one-time shifting for fr over the Property 4: A smaller is related to a higher frequency normalized representing tensor. in the hypergraph Fourier space, where its corresponding From (31a), we see that the total variation describes how spectrum basis is called a high frequency component. much the signal component changes from a node to its neigh- bors over the hypergraph shifting. Thus, we have the following F. Signals With Limited Spectrum Support definition of hypergraph frequency. With the order of frequency, we define the bandlimited Definition 9 (Hypergraph Frequency): Hypergraph signals as follows. frequency describes how oscillatory the signal compo- Definition 10 (Bandlimited Signal): Order the coefficients nent is with respect to the given hypergraph. A frequency as λ = [λ ··· λ ], where λ ≥ ··· ≥ λ ≥ 0, together component f is associated with a higher frequency if the 1 N 1 N r with their corresponding f ’s. A hypergraph signal s[M−1] total variation of this frequency component is larger. r is defined as K-bandlimited if the HGFT transformed sig- Note that the physical meaning of graph frequency was nal sˆ = [sˆ ,...,sˆ ]T has sˆ = 0 for all i ≥ K, where stated in GSP [2]. Generally, the graph frequency is highly 1 N i K ∈{1, 2,...,N}. The smallest K is defined as the bandwidth related to the total variation of the corresponding frequency and the corresponding boundary is defined as W = λ . component. Similarly, the hypergraph frequency also relates to K Note that a larger λ corresponds to a lower frequency as the corresponding total variation. We will discuss more about i we mentioned in Property 4. Then, the frequency are ordered the interpretation of the hypergraph frequency and its rela- from low to high in the definition above. Moreover, we use tionships with DSP and GSP later in Section IV-A, to further the index K instead of the coefficient value λ to define the consolidate our hypergraph frequency definition. bandwidth for the following reasons. Based on the definition of total variation, we describe one 1) Identical λ’s in two different hypergraphs do not refer to important property of TV(f ) in the following theorem. r the same frequency. Since each hypergraph has its own Theorem 1: Define a supporting matrix ⎡ ⎤⎡ ⎤ adjacency tensor and spectrum space, the comparison λ fT of multiple spectrum pairs (λ , f )’s is only meaning-   1 1 i i 1 ⎢ .. ⎥⎢ . ⎥ ful within the same hypergraph. Moreover, there exists Ps = f1 ··· fN ⎣ . ⎦⎣ . ⎦. (33) λ a normalization issue in the decomposition of different max λ T N fN adjacency tensors. Thus, it is not meaningful to compare norm λ With the normalized representing tensor F = (1/λmax)F, k’s across two different hypergraphs. λ the total variation of hypergraph spectrum fr is calculated as 2) Since k values are not continuous over k, different λ frequency cutoffs of may lead to the same bandlim- ( ) = − norm λ = . TV fr fr fr(1) (34a) ited space. For example, suppose that k 0 5 and 1 λ = . λ = . λ = . = − k+1 0 8. Then, 0 6 and 0 7 would lead fr Psfr 1 (34b) λ to the same cutoff in the frequency space, which makes = − r . bandwidth definition nonunique. 1 λ (34c) max As we discussed in Section III-D, the hypergraph signal is Moreover, TV(fi)>TV(fj) iff λi <λj. the Hadamard product of the original signal in the frequency Proof: For hypergraph signals, the output of one-time domain. Then, we have the following property of bandwidth. shifting of fr is calculated as Property 5: The bandwidth K is the same, based on the HGFT of the hypergraph signals sˆ and that of the original N  T M−1 signals s˜. f ( ) = λ f f f = λ f . (35) r 1 i i i r r r This property allows us to analyze the spectrum support of = i 1 the hypergraph signal by looking into the original signal with norm norm = (λ /λ ) Based on the normalized F ,wehavefr(1) r max fr. lower complexity. Recall that we can add fi by using zero It is therefore easy to obtain (34c) from (34a). To obtain (34b), coefficients λi when R < N as mentioned in Section III-C. we have The added basis should not affect the HGFT signals in Fourier N  λ space. According to the structure of bandlimited signal, we = λ T = r . need the added f could meet the following conditions: 1) f ⊥ Psfr ifi fi fr fr (36) i i λmax = T · → | |= i=1 fp for p i;2)fi s 0; and 3) fi 1. It is clear that (34b) is the same as (34c). Since λ is a real and non-negative, we have G. Implementation and Complexity  λ − λ We now consider the implementation and complexity issues TV(f ) − TV f = j i . of HGFT. Similar to GFT, the process of HGFT consists of two i j λ (37) max steps: 1) decomposition and 2) execution. The decomposition Obviously, TV(fi)>TV(fj) iff λi <λj. is to calculate the hypergraph spectrum basis, and the exe- Theorem 1 shows that the supporting matrix Ps can help us cution transforms signals from the hypergraph vertex domain apply the total variation more efficiently in some real applica- into the spectrum domain. tions. Moreover, it provides the order of frequency according 1) The calculation of the spectrum basis by the orthogonal- to the coefficients λi’s with the following property. CP decomposition is an important preparation step for ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 649

HGFT. A straightforward algorithm would decompose 2,...,(N/2) − 1,(N/2). The highest and lowest frequencies the representing tensor F with the spectrum basis fi’s correspond to n = N/2 and n = 0, respectively. Note that n −j2πk(n/N) and coefficients λi’s as in (24). Efficient tensor decom- varies from −(N/2) + 1to(N/2) here. Since e = position is an active topic in both fields of e−j2πk(n+N/N), we can let n vary from 0 to N −1 and cover the and engineering. There are a number of methods for complete period. Now, n varies in exact correspondence to νn, CP decomposition in the literature. In [54] and [58], and the aforementioned conclusions are drawn. The highest motivated by the spectral theorem for real symmetric frequency occurs at n = (N/2). matrices, orthogonal-CP decomposition algorithms for The total variation in DSP is defined as the differences symmetric tensors are developed based on the poly- among the signals over time [59], i.e., nomial equations. Afshar et al. [26] proposed a more N−1 general decomposition algorithm for spatio-temporal TV(s) = s − s( − ) (38a) data. Other works, including [55]Ð[57], tried to develop n n 1modN n=0 faster decomposition methods for signal processing and = s − CNs 1 (38b) big data applications. The rapid development of ten- sor decomposition and the advancement of computation where ability will benefit the efficient derivation of hypergraph ⎡ ⎤ 00··· 01 spectrum. ⎢ ⎥ ⎢10··· 00⎥ 2) The execution of HGFT with a known spectrum basis ⎢. . . . .⎥ ⎢...... ⎥ is defined in (26b). According to (29), the HGFT of CN = ⎢. .⎥. (39) hypergraph signal is an M − 1 times Hadamard product ⎢ .. ⎥ ⎣00 . 00⎦ of the original signal in the hypergraph spectrum space. ··· This relationship can help execute HGFT and iHGFT of 00 10 hypergraph signals more efficiently by applying matrix When we perform the eigen-decomposition of CN,wesee −j(2πn/N) operations on the original signals. Clearly, the complex- that the eigenvalues are λn = e with eigenvector fn, ity of calculating the original signals in the frequency 0 ≤ n ≤ N − 1. More specifically, the total variation of the ˜ = ( 2) domain s Vs is O N . In addition, since the com- frequency component fn is calculated as (M−1) putation complexity of the power function x could π j 2 n be O(log(M − 1)) and each vector has N entries, the TV(fn) = 1 − e N (40) complexity of calculating the M − 1 times Hadamard product is O(N log(M − 1)). Thus, the complexity of which increases with n for n ≤ (N/2) before decreasing with general HGFT implementation is O(N2 +N log(M −1)). n for (N/2)

Fig. 12. Example of frequency order in GSP for complex eigenvalues.

Fig. 13. Frequency components in a hypergraph with nine nodes and five the graph between neighborhood vertices is faster, which indi- hyperedges. The left panel shows the frequency components before shifting, cates a higher graph frequency. Note that once the graph and the right panel shows the frequency components after shifting. The values is undirected, i.e., the eigenvalues are real numbers, the of signals are illustrated by colors. Higher frequency components imply larger frequency decreases with the increase of the eigenvalue similar changes across two panels. as HGSP in Section III-E; otherwise, if the graph is undirected, i.e., the eigenvalues are complex, the frequency changes as shown in Fig. 12, which has consistency with the changing pattern of DSP frequency [5]. 1) Graph Signal Processing: One of the motivations for We now turn to our HGSP framework. Like GSP, HGSP developing HGSP is to develop a more general framework analyzes signals in the hypergraph vertex domain. Different for signal processing in high-dimensional graphs. Thus, GSP from normal graphs, each hyperedge in HGSP connects more should be a special case of HGSP. We illustrate the GSP-HGSP than two nodes. The neighbors of a vertex vi include all the relationship as follows. nodes in the hyperedges containing vi. For example, if there 1) Graphical Models: GSP is based on the normal exists a hyperedge e1 ={v1, v2, v3}, nodes v2 and v3 are both graphs [2], where each simple edge connects exactly two neighbors of node v1. As we mentioned in Section III-E, the nodes; HGSP is based on the hypergraphs, where each total variation of HGSP is defined as the difference between hyperedge could connect more than two nodes. Clearly, continuous signals over the hypergraph, i.e., the difference the normal graph is a special case of hypergraph, for between the signal components and their respective shifted which the m.c.e equals two. More specifically, a normal versions graph is a 2-uniform hypergraph [60]. Hypergraph pro- vides a more general model for multilateral relationships N  ( ) = − norm ··· while normal graphs are only able to model bilateral TV s si Fij ···j − sj1 sjM−1 (42a) 1 M 1 relationships. For example, a 3-uniform hypergraph is i=1 j1,...,jM−1 able to model the trilateral interaction among users in norm [M−1] = s − F s (42b) a social network [61]. As hypergraph is a more general 1 model for high-dimensional interactions, HGSP is also where Fnorm = (1/λ )F. Similar to DSP and GSP, pairs of max more powerful for high-dimensional signals. (λ , f ) in (24) characterize the hypergraph spectrum space. i i 2) Algebraic Models: HGSP relies on tensors while GSP A spectrum component with a larger total variation repre- relies on matrices, which are second-order tensors. sents a higher frequency component, which indicates faster Benefiting from the generality of tensor, HGSP is changes over the hypergraph. Note that as we mentioned in broadly applicable in high-dimensional data analysis. Section III-E, the total variation is larger and the frequency is 3) Signals and Signal Shifting: In HGSP, we define the higher if the corresponding λ is smaller because we usually hypergraph signal as M − 1 times tensor outer product talk about undirected hypergraph and the λ’s are real in the of the original signal. More specifically, the hypergraph tensor decomposition. To illustrate it more clearly, we consider signal is the original signal if M = 2. Basically, the a hypergraph with nine nodes, five hyperedges, and m.c.e = 3 hypergraph signal is the same as the graph signal if as an example, shown in Fig. 13. As we mentioned before, a each hyperedge has exactly two nodes. Also shown in smaller λ indicates a higher frequency in HGSP. Hence, we Fig. 10(b) of Section III-C, graph shifting is a special see that the signals have more changes on each vertex if the case of hypergraph shifting when M = 2. frequency is higher. 4) Spectrum Properties: In HGSP, the spectrum space is defined over the orthogonal-CP decomposition in B. Connections to Other Existing Works terms of the basis and coefficients, which is also the We now discuss the relationships between the HGSP and E-eigenpairs of the representing tensor [62], shown other existing works. in (27). In GSP, the spectrum space is defined as the ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 651

matrix eigenspace. Since the tensor algebra is an exten- 3) HGSP cares more about the spectrum space while sion of matrix, the HGSP spectrum is also an extension learning focuses more on data. of the GSP spectrum. For example, as discussed in 4) As HGSP is an extension of DSP and GSP, it is more Section III, GFT is the same as HGFT, when M = 2. suitable to handle detailed tasks, such as compression, Overall, HGSP is an extension of GSP, which is both more denoising, and detection. All these features make HGSP general and novel. The purpose of developing the HGSP a different technical concept from hypergraph learning. framework is to facilitate more interesting signal processing tasks that involve high-dimensional signal interactions. V. T OOLS FOR HYPERGRAPH SIGNAL PROCESSING 2) Higher Order Statistics: Higher order statistics (HOS) In this section, we introduce several useful tools built within has been effectively applied in signal processing [63], [64], the framework of HGSP. which can analyze the multilateral interactions of signal sam- ples and have found successes in many applications, such as blind feature detection [65], decision [66], and signal clas- A. Sampling Theory sifications [67]. In HOS, the kth-order cumulant of random Sampling is an important tool in data analysis, which selects T variables x = [x1,...,xk] is defined [68] based on the a subset of individual data points to estimate the characteristics T coefficients of v = [v1,...,vk] in the Taylor series expansion of the whole population [89]. Sampling plays an important role of cumulant-generating function, i.e., in applications, such as compression [27] and storage [90].  Similar to sampling signals in time, the HGSP sampling theory ( ) = T . K v ln E exp jv x (43) can be developed to sample signals over the vertex domain. It is easy to see that HGSP and HOS are related to high- We now introduce the basics of HGSP sampling theory for dimensional signal processing. They can be both represented lossless signal dimension reduction. − by tensor. For example, in the multichannel problems of [69], To reduce the size of a hypergraph signal s[M 1], there are ={ ( , , )} two main approaches: 1) to reduce the dimension of each order the third-order cumulant C Cyi,yj,yz t t1 t2 of zero-mean signals can be represented as a multilinear array, e.g., and 2) to reduce the number of orders. Since the reduction of order breaks the structure of hypergraph and cannot always C , , (t, t , t ) = E y (t)y (t + t )y (t + t ) (44) yi yj yz 1 2 i j 1 z 2 guarantee perfect recovery, we adopt the dimension reduction which is essentially a third-order tensor. More specifically, if of each order. To change the dimension of a certain order, we there are k samples, the cumulant C can be represented as an can use the n-Mode product. Since each order of the hyper- pk-element vector, which is the flattened signal tensor similar graph signal is equivalent, the n-Mode product operators of to the n-mode flattening of HGSP signals. each order are the same. Then, the sampling operation of the Although both HOS and HGSP are high-dimensional sig- hypergraph signal is defined as follows. nal processing tools, they focus on complementary aspects Definition 11 (Sampling and Interpolation): Suppose that of the signals. Specifically, HGSP aims to analyze signals Q is the dimension of each sampled order. The sampling over the high-dimensional vertex domain, while HOS focuses operation is defined as on the statistical domain. In addition, the forms of signal com- [M−1] [M−1] s = s × U × U ×···× − U (45) bination are also different, where HGSP signals are based Q 1 2 M 1 on the hypergraph shifting defined as in (21), whereas HOS where the sampling operator is U ∈ RQ×N to be defined later, cumulants are based on the statistical average of shifted signal Q×Q ×···×Q − products. [M 1] ∈ R M−1 times and the sampled signal is sQ . 3) Learning Over Hypergraphs: Hypergraph learning is The interpolation operation is defined by another tool to handle the structured data and sometimes uses [M−1] = [M−1] × × ×···× similar techniques to HGSP. For example, Hein et al. [70] s sQ 1 T 2 T M−1 T (46) proposed an alternative definition of hypergraph total varia- where the interpolation operator is T ∈ RN×Q to be defined tion and design algorithms in accordance with classification later. and clustering problems. In addition, hypergraph learning also As presented in Section III, the hypergraph signal and orig- has its own definition of the hypergraph spectrum space. For inal signal are different forms of the same data. They may example, [39] and [40] represented the hypergraphs using a have similar properties in structures. To derive the sampling graph-like similarity matrix and defined a spectrum space as theory for perfect signal recovery efficiently, we first consider the eigenspace of this similarity matrix. Other works consid- the sampling operations of the original signal. ered different aspects of hypergraph, including the hypergraph Definition 12 (Sampling Original Signal): Suppose an Laplacian [71] and hypergraph lifting [21]. original K-bandlimited signal s ∈ RN is to be sampled into The HGSP framework exhibits features different from s ∈ RQ, where q ={q ,...,q } denotes the sequence hypergraph learning. Q 1 Q of sampled indices and q ∈{1, 2,...,N}. The sampling 1) HGSP defines a framework that generalizes the classical i operator U ∈ RQ×N is a linearity mapping from RN to RQ, digital signal processing and traditional GSP. defined by 2) HGSP applies different definitions of hypergraph char- acteristics, such as the total variation, spectrum space, 1, j = q U = i (47) and Laplacian. ij 0, otherwise 652 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020 and the interpolation operator T ∈ RN×Q is a linear mapping Theorem 3: Define the sampling operator U ∈ RQ×N Q N from R to R . Then, the sampling operation is defined by according to Uji = δ[i − qj] where 1 ≤ qi ≤ N, i = , ..., ≥ = · 1 Q. By choosing Q K and the interpolation oper- sQ U s (48) = F T ∈ RN×Q F T = F T = ator T [K]Z with ZU [K] IK and [K] and the interpolation operation is defined by [f1,...,fK], we can achieve a perfect recovery, i.e., s = TUs for all K-bandlimited original signal s and the corresponding  = · . − s T sQ (49) hypergraph signal s[M 1]. Analyzing the structure of the sampling operations, we have Proof: To prove the theorem, we show that TU is a the following properties. projection operator and T spans the space of the first K = Theorem 2: The hypergraph signal s[M−1] shares the same eigenvectors. From Lemma 1 and s TsQ,wehave × sampling operator U ∈ RQ N and interpolation operator T ∈ = F T ˜ = F T . s [K]s[K] [K]ZsQ (54) RN×Q with the original signal s. Proof: We first examine one of the orders in n-Mode As a result, rank(ZsQ) = rank(s˜[K]) = K. Hence, we conclude product of hypergraph signal, i.e., nth-order of s[M−1],1≤ that K ≤ Q. n ≤ N,as Next, we show that TU is a projection by proving that TU · TU = TU. Since, we have Q ≥ K and   N − [M 1] × = ··· . T s n U si1 si2 siM−1 Ujin ZUF = IK. (55) i1···in−1jin+1···iM−1 [K] in=1 (50) We have [M−1] TU · TU = F T ZUF T ZU (56a) Since all elements in sQ should also be the elements of [K] [K] [M−1] = = F T = . s after sampling, only one Ujin 1 exists for each j [K]ZU TU (56b) according to (50), i.e., only one term in the summation exists for each j in the right part of (50). Moreover, since U samples Hence, TU is a projection operator. For the spanning part, the proof is the same as that in [27]. over all the order, Upi = 1 and Uji = 1 cannot exist at the n n − Theorem 3 shows that a perfect recovery is possible for same time so that all the entries in s[M 1] arealsoins[M−1]. Q a bandlimited hypergraph signal. We now examine some Suppose q ={q , q ,...,q } is the places of nonzero U ’s, 1 2 Q jqj interesting properties of the sampled signal. we have From the previous discussion, we have s˜[K] = ZsQ, which −  s[M 1] i , i ,...,i = s s ···s . (51) has a similar form to HGFT, where Z can be treated as the Q 1 2 Q iq1 iq2 iqQ Fourier transform operator. Suppose that Q = K and Z = As a result, we have U = δ[i − q ], which is the same as the T ji j [z1 ··· zK] . We have the following first-order difference sampling operator for the original signal. For the interpolation property.  operator, the proof is similar and hence omitted. = K λ · Theorem 4: Define a new hypergraph by FK i=1 i Given Theorem 2, we only need to analyze the operations [M−1] zi ◦ ··· ◦ zi. Then, for all K-bandlimited signal s ∈ of the original signal in the sampling theory. Next, we discuss N × N ×···× N the conditions for perfect recovery. For the original signal, we R M times , it holds that have the following property.   ∈ RN − [M−1] = − [M−1] . Lemma 1: Suppose that s is a K-bandlimited signal. s[K] FKs[K] U s Fs (57) Then, we have Proof: Let the diagonal matrix [K] consist of the first K s = F T s˜ (52) {λ ,...,λ } F T = [K] [K] coefficients 1 K . Since ZU [K] IK,wehave where F T = [f ,...,f ] and s˜ ∈ RK consists of the first [M−1] −1 [K] 1 K [K] FKs = Z [K]sˆ[K] (58a) K elements of the original signal in the frequency domain, [K] = UF T  sˆ (58b) i.e., s˜. ⎡[⎛K] [K] [K] ⎞⎛ ⎞ ⎤ ˜ = T = K Proof: Since s is K-bandlimited, si fi s 0 when  = ⎣⎝ λ · ◦···◦ ⎠⎝ ◦···◦ ⎠ + ⎦ i > K. Then, according to (30), we have U i fi f i s s 0 = K N K i 1 M times M−1 times = T = T + T = T + (58c) s V Vs fifi s fifi s fifi s 0 − i=1 i=K+1 i=1 = UFs[M 1]. (58d) = F T ˜ [K]s[K] (53) = − [M−1] = Since s[K] Us, it therefore holds that s[K] FKs[K] T [M−1] where V = [f1,...,fN] . U(s − Fs ). This lemma implies that the first K frequency components Theorem 4 shows that the sampled signals form a new carry all the information of the original signal. Since the hypergraph that preserves the information of the one-time hypergraph signal and the original signal share the same sam- shifting filter over the original hypergraph. For example, the pling operators, we can reach a similar conclusion for perfect left-hand side of (57) represents the difference between the recovery as [27] and [28], given in the following theorem. sampled signal and the one-time shifted version in the new ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 653 hypergraph. The right-hand side of (57) is the difference for the original signal based on the supporting matrix P whose between a signal and its one-time shifted version in the original kth-order term is defined as hypergraph, together with the sampling operator. That is, the k s< > = P s. (63) sampled result of the one-time shifting differences in the orig- k inal hypergraph is equal to the one-time shifting differences The ath-order polynomial filter is simply given as in the new sampled hypergraph. a  k s = αkP s. (64) B. Filter Design k=1 A polynomial filter over the original signal can be determined Filter is an important tool in signal processing applica- with specific choices of a and α. tions, such as denoising, feature enhancement, smoothing, Let us consider some interesting properties of the polyno- and classification. In GSP, the basic filtering is defined as mial filter for the original signal. First, given the kth-order s = F s where F is the representing matrix [2]. In HGSP, M M term, we have the following property as Lemma 2. the basic hypergraph filtering is defined in Section III-C as [M−1] Lemma 2: s( ) = Fs , which is designed according to the tensor 1 N  contraction. The HGSP filter is a multilinear mapping [72]. k T s< > = λ f s f . (65) The high-dimensionality of tensors provides more flexibility k r r r r=1 in designing the HGSP filter. T = ,...,  = 1) Polynomial Filter Based on Representing Tensor: Proof: Let V [f1 fN] and ( λ ,...,λ ) T = Polynomial filter is one basic form of HGSP filters, with diag [ 1 N] . Since V V I,wehave k T T T which signals are shifted several times over the hypergraph. An P = V VV  V ···V V (66a) example of polynomial filter is given as Fig. 8 in Section III-B. k times A k-time shifting filter is defined as = VT kV. (66b) [M−1] s(k) = Fs( − ) (59a) Therefore, the kth-order term is given as k 1   − [M−1] T k [M−1] [M 1] s< > = V  Vs (67a) = ··· [M−1] . k F F Fs (59b) N  = λk T . r fr s fr (67b) k times r=1 More generally, a polynomial filter is designed as From Lemma 2, we obtain the following property of the a  polynomial filter for the original signal. s = α s( ) (60) k k Theorem 5: Let h(·) be a polynomial function. For the poly- k=1 nomial filter H = h(P) for the original signal, the filtered where {αk} are the filter coefficients. Such HGSP filters are signal satisfies based on the multilinear tensor contraction, which could be N N  T used for different signal processing tasks by selecting the Hs = h(s< >) = h(λ )f f s . (68) {α } i r r r specific parameters a and i . r=1 r=1 In addition to the general polynomial filter based on This theorem works as the invariance property of expo- hypergraph signals, we provide another specific form of nential in HGSP, similar to those in GSP and DSP [2]. polynomial filter based on the original signals. As men- Equations (60) and (63) provide more choices for HGSP poly- tioned in Section III-E, the supporting matrix P in (33) s nomial filters in HGSP and data analysis. We will give specific captures all the information of the frequency space. For exam- = λ examples of practical applications in Section VI. ple, the un-normalized supporting matrix P maxPs is 2) General Filter Design Based on Optimization: In GSP, calculated as ⎡ ⎤⎡ ⎤ some filters are designed via optimization formulations [2], λ fT   1 1 [73], [74]. Similarly, general HGSP filters can also be designed ⎢ .. ⎥⎢ . ⎥ via optimization approaches. Assume y is the observed signal P = f1 ··· fN ⎣ . ⎦⎣ . ⎦. (61) before shifting and s = h(F, y) is the shifted signal by HGSP λ fT N N filter h(·) designed for specific applications. Then, the filter Obviously, the hypergraph spectrum pair (λr, fr) is an eigen- design can be formulated as pair of the supporting matrix P. Moreover, Theorem 1 shows − 2 + γ ( , ) min s y 2 f F s (69) that the total variation of frequency component equals to a h function of P, i.e., where F is the representing tensor of the hypergraph and f (·) is 1 a penalty function designed for specific problems. For exam- TV(f ) = f − Pf . r r λ r (62) ple, the total variation could be used as a penalty function for max 1 the purpose of smoothness. Other alternative penalty functions From (62), P can be interpreted as a shifting matrix for the include the label rank, Laplacian regularization, and spectrum. original signal. Accordingly, we can design a polynomial filter In Section VI, we shall provide some filter design examples. 654 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

TABLE I COMPRESSION RATION OF DIFFERENT METHODS

For the GSP-based method in [5], we represent the images as graphs with: 1) the 4-connected neighbor model [31] and 2) the distance-based model in which an edge exists only if the spatial distance is below α and the pixel distance is below β. The graph Fourier space and corresponding coefficients in the frequency domain are then calculated to represent the original image. We use the compression ratio CR = N/C to measure Fig. 14. Test set of images. the efficiency of different compression methods. A large CR implies higher compression efficiency. The result is summa- VI. APPLICATION EXAMPLES rized in Table I, from which we can see that our HGSP-based In this section, we consider several application examples compression method achieves higher efficiency than the GSP- for our newly proposed HGSP framework. These examples based compression methods. illustrate the practical use of HGSP in some traditional tasks, In addition to the image data sets, we also test the effi- such as filter design and efficient data representation. We also ciency of HGSP spectrum compression over the MovieLens consider problems in data analysis, such as classification and data set [83], where each movie data point has rating scores clustering. and tags from viewers. Here, we treat scores of movies as sig- nals and construct graph models based on the tag relationships. Similar to the game data set shown in Fig. 2(b), two movies A. Data Compression are connected in a normal graph if they have similar tags. For Efficient representation of signals is important in data anal- example, if movies are labeled with “love” by users, they are ysis and signal processing. Among many applications, data connected by an edge. To model the data set as a hypergraph, compression attracts significant interests for efficient storage we include the movies into one hyperedge if they have sim- and transmission [75]Ð[77]. Projecting signals into a suitable ilar tags. For convenience and complexity, we set m.c.e = 3. orthonormal basis is a widely used compression method [5]. With the graph and hypergraph models, we compress the sig- Within the proposed HGSP framework, we propose a data nals using the sampling method discussed earlier. For lossless compression method based on the HGFT. We can represent N compression, our HGSP method is able to use only 11.5% signals in the original domain with C frequency coefficients in of the samples from the original signals to recover the orig- the hypergraph spectrum domain. More specifically, with the inal data set by choosing the suitable additional basis (see help of the sampling theory in Section V, we can compress Section III-F). On the other hand, the GSP method requires an K-bandlimited signal of N signal points losslessly with K 98.6% of the samples. We also test the error between the spectrum coefficients. recovered and original signals based on the varying numbers To test the performance of our HGSP compression and of samples. As shown in Fig. 15, the recovery error naturally demonstrate that hypergraphs may be a better representation of decreases with more samples. Note that our HGSP method structured signals than normal graphs, we compare the results achieves a much better performance once it obtains sufficient of image compression with those from the GSP-based com- number of samples, while GSP error drops slowly. This is due pression method [5]. We test over seven small-size 16×16 icon to the first few key HGSP spectrum basis elements carry most images and three-size 256 × 256 photograph images, shown of the original information, thereby leading to a more efficient in Fig. 14. representation for structured data sets. The HGSP-based image compression method is described Overall, hypergraph and HGSP lead to more efficient as follows. Given an image, we first model it as a hypergraph descriptions of the structured data in most applications. With with the image adaptive neighborhood hypergraph (IANH) a more suitable hypergraph model and more developed meth- model [30]. To reduce complexity, we pick three closest neigh- ods, the HGSP framework could be a very new important tool bors in each hyperedge to construct a third-order adjacency in data compression. tensor. Next, we can calculate the Fourier basis of the adja- cency tensor as well as the bandwidth K of the hypergraph signals. Finally, we can represent the original images using C B. Spectral Clustering spectrum coefficients with C = K. For a large image, we may Clustering problem is widely used in a variety of appli- first cut it into smaller image blocks before applying HGSP cations, such as , computer vision, compression to improve speed. and communication problems. Among many methods, spectral ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 655

Algorithm 2 LP-HGSP Classification 1: Input: Dataset s consisting of labeled training data and unlabeled test data. 2: Establish a hypergraph by similarities and set unlabeled data as {0} in the signal s. 3: Train the coefficients α of the LP-HGSP filter in Eq. (70) by minimizing the errors of signs of training data in s = Hs.  4: Implement LP-HGSP filter. If s i > 0, the ith data is labeled as 1; otherwise, it is labeled as −1. 5: Output: Labels of test signals. Fig. 15. Errors of compression methods over the MovieLen Data set: y-axis shows the error between the original signals and recovered signals, and x-axis is the number of samples. To measure the performance, we compute the intracluster vari- Algorithm 1 HGSP Fourier Spectral Clustering ance and the average Silhouette of nodes [41]. Since we expect 1: Input: Dataset modeled in hypergraph H, the number of the data points in the same cluster to be closer to each other, clusters k. the performance is considered better if the intracluster vari- 2: Construct adjacency tensor A in Eq. (15) from the hyper- ance is smaller. On the other hand, the Silhouette value is a graph H. measure of how similar an object is to its own cluster ver- 3: Apply orthogonal decomposition to A and compute sus other clusters. A higher Silhouette value means that the clustering configuration is more appropriate. Fourier basis fi together with Fourier frequency coefficient The comparative results are shown in Fig. 16. Form the λi using Eq. (24). test result, we can see that our HGSP method generates a 4: Find the first E leading Fourier basis fi with λi = 0 and construct a Fourier spectrum matrix S ∈ RN×E with lower variance and a higher Silhouette value. More intuitively, columns as the leading Fourier basis. we plot the clusters of animals in Fig. 17. Cluster 2 covers 5: Cluster the rows of S into k clusters using k-means small animals like bugs and snakes. Cluster 3 covers carnivores clustering. whereas Cluster 7 groups herbivores. Cluster 4 covers birds 6: Put node i in partition j if the ith row is assigned to the and Cluster 6 covers fish. Cluster 5 contains the rodents such as jth cluster. mice. One interesting category is Cluster 1: although dolphins, 7: Output: k partitions of the hypergraph data set. sea-lions, and seals live in the sea, they are mammals and are clustered separately from Cluster 6. From these results, we see that the HGSP spectral clustering method could achieve better performance and our definition of hypergraph spectrum may clustering is an efficient clustering method [6], [37]. Modeling be more appropriate for spectral clustering in practice. the data set by a normal graph before clustering the data spectrally, significant improvement is possible in structured C. Classification data [91]. However, such standard spectral clustering methods only exploit pairwise interactions. For applications where the Classification problems are important in data analysis. interactions involve more than two nodes, hypergraph spectral Traditionally, these problems are studied by the learning meth- clustering should be a more natural choice. ods [35]. Here, we propose an HGSP-based method to solve {± } In hypergraph spectral clustering, one of the most important the 1 classification problem, where a hypergraph filter issues is how to define a suitable spectral space. Li et al. [39] serves as a classifier. and Ahn et al. [40] introduced the hypergraph similarity spec- The basic idea adopted for the classification filter design is trum for spectral clustering. Before spectral clustering, they label propagation (LP), where the main steps are to first con- first modeled the hypergraph structure into a graph-like sim- struct a transmission matrix and then propagate the label based ilarity matrix. They then defined the hypergraph spectrum on the transmission matrix [36]. The label will converge after based on the eigenspace of the similarity matrix. However, a sufficient number of shifting steps. Let W be the propagation since the modeling of hypergraph with a similarity matrix may matrix. Then the label could be determined by the distribution  = k  result in certain loss of the inherent information, a more effi- s W s. We see that s is in the form of filtered graph sig- cient spectral space defined directly over hypergraph is more nal. Recall that in Section V-B, the supporting matrix P has desired as introduced in our HGSP framework. With HGSP, been shown to capture the properties of hypergraph shifting as the hypergraph Fourier space from the adjacency tensor has and total variation. Here, we propose an HGSP classifier based a similar form to the spectral space from adjacency matrix in on the supporting matrix P defined in (61) to generate matrix GSP, we could develop the spectral clustering method based H = (I + α1P)(I + α2P) ···(I + αkP). (70) on the hypergraph Fourier space as in Algorithm 1. To test the performance of the HGSP spectral clustering, we Our HGSP classifier is to simply rely on sign[Hs]. The main compare the achieved results with those from the hypergraph steps of the propagated LP-HGSP classification method is similarity method (HSC) in [40], using the zoo data set [34]. described in Algorithm 2. 656 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

Fig. 16. Performance of hypergraph spectral clustering. (a) Variance in the same cluster. (b) Average Silhouette in the hypergraph.

Fig. 17. Cluster of animals.

To test the performance of the hypergraph-based classifier, the experiment, we first pick different sizes of data subsets we implement them over the zoo data sets. We determine from the original zoo data set randomly as the new data sets. whether the animals have hair based on other features, for- Then, with each size of the new data set, 40% data points are mulated as a {±1} classification problem. We randomly pick randomly picked as the training data, and the remaining data different percentages of training data and leave the remaining points are used as the test data. We average the results over data as the test set among the total 101 data points. We smooth 10 000 times of experiments to smooth the curve. We can see the curve with 1000 combinations of randomly picked training from Fig. 18(b) that the performance of SVM shows significant sets. We compare the HGSP-based method against the SVM improvement as the data set size grows larger. This compar- method with the RBF kernel and the LP-GSP method [2]. In ison indicates that SVM may require more data to achieve the experiment, we model the data set as hypergraph or graph better performance, as shown in the comparative results of based on the distance of data. The threshold of determining the Fig. 18(a). Generally, the HGSP-based method exhibits bet- existence of edges is designed to ensure the absence of isolated ter overall performance and shows significant advantages with nodes in the graph. For the LP method, we set k = 15. The small data sets. Although GSP and HGSP classifiers are both result is shown in Fig. 18(a). From the result, we see that the model-based, hypergraph-based ones usually perform better LP HGSP method (LP-HGSP) is moderately better than LP- than graph-based ones, since hypergraphs provide a better GSP. The graph-based methods, i.e., LP-GSP and LP-HGSP, description of the structured data in most applications. both performs better than SVM. The performance of SVM appears less satisfactory, likely because the data set is rather small. The model-based graph and hypergraph methods are D. Denoising rather robust when applied to such small data sets. To illustrate Signals collected in the real world often contain noises. this effect more clearly, we tested the SVM and hypergraph Signal denoising is thus an important application in signal performance with new configurations by the increasing data processing. Here, we design a hypergraph filter to implement set size and the fixing ratio of training data in Fig. 18(b). In signal denoising. ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 657

Fig. 18. Performance of classifiers. (a) Over data sets with a fixed datasize and different ratios of training data. (b) Over data sets with different datasizes and a fixed ratio of training data.

TABLE II MSE OF FILTERED SIGNAL

As mentioned in Section III, the smoothness of a graph The HGSP-based filter follows a similar idea to GSP-based signal, which describes the variance of hypergraph signals, denoising filter [33]. However, different definitions of the total could be measured by the total variation. Assume that the variation and signal shifting result in the different designs original signal is smooth. We formulate signal denoising as an of HGSP versus GSP filters. To test the performance, we optimization problem. Suppose that y = s + n is a noisy signal compare our method with the basic Wiener filter, Median fil- with noise n, and s = h(F, y) is the denoised data by the ter, and GSP-based filter [33] using the image data sets of HGSP filter h(·). The denoising problem could be formulated Fig. 14. We apply different types of noises. To quantify the fil- as an optimization problem ter performance, we use the mean square error (MSE) between each true signal and the corresponding signal after filtering.  − 2 + γ ·  − norm2 min s y s s <1> (71) h 2 2 The results are given in Table II. From these results, we can see that for each type of noise and picking optimized γ for all where the second term is the weighted quadratic total variation the methods, our HGSP-based filter out-performs other filters. of the filtered signal s based on the supporting matrix. The denoising problem of (71) aims to smooth the signal based on the original noisy data y. The first term keeps the E. Other Potential Applications denoised signal close to the original noisy signal, whereas the In addition to the application algorithms discussed above, second term tries to smooth the recovered signal. Clearly, the there could be many other potential applications for HGSP. In optimized solution of filter design is this section, we suggest several potential applicable data sets  T −1 and systems for HGSP. s = h(F, y) = [I + γ(I − Ps) (I − Ps)] y (72)  1) IoT: With the development of IoT techniques, the = N (λ /λ ) T where Ps i=1 i max fifi describes a hypergraph system structures become increasingly complex, which Fourier decomposition. From (72), we see that the solution makes traditional graph-based tools inefficient to handle  is in the form of s = Hy for denoising, which adopts a the high-dimensional interactions. On the other hand, hypergraph filter h(·) as the hypergraph-based HGSP is powerful in dealing −1 with high-dimensional analysis in the IoT system: for = + γ( − )T( − ) . H I I Ps I Ps (73) example, data intelligence over sensor networks, where 658 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

hypergraph-based analysis has already attracted signifi- [4] S. Barbarossa and M. Tsitsvero, “An introduction to hypergraph sig- cant attentions [92], and HGSP could be used to handle nal processing,” in Proc. Acoust. Speech Signal Process. (ICASSP), Shanghai, China, Mar. 2016, pp. 6425Ð6429. tasks like clustering, classification, and sampling. [5] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on 2) Social Network: Another promising application is the graphs: Frequency analysis,” IEEE Trans. Signal Process., vol. 62, analysis of social network data sets. As discussed ear- no. 12, pp. 3042Ð3054, Apr. 2014. [6] J. Shi and J. Malik, “Normalized cuts and image segmentation,” lier, a hyperedge is an efficient representation for the IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888Ð905, multilateral relationship in social networks [15], [80]; Aug. 2000. HGSP can then be effective in analyzing multilateral [7] R. Wagner, V. Delouille, and R. G. Baraniuk, “Distributed wavelet denoising for sensor networks,” in Proc. 45th IEEE Conf. Decis. Control, node interactions. San Diego, CA, USA, Dec. 2006, pp. 373Ð379. 3) Natural Language Processing: Furthermore, natural lan- [8] S. K. Narang and A. Ortega, “Local two-channel critically sampled filter- guage processing is an area that can benefit from banks on graphs,” in Proc. 17th IEEE Int. Conf. Image Process. (ICIP), HGSP. Modeling the sentence and language by hyper- Hong Kong, Sep. 2010, pp. 333Ð336. [9] X. Zhu and M. Rabbat, “Approximating signals supported on graphs,” graphs [81], [82], HGSP can be a tool for language in Proc. ICASSP, Kyoto, Japan, Mar. 2012, pp. 3921Ð3924. classification and clustering tasks. [10] S. Zhang, H. Zhang, H. Li, and S. Cui, “Tensor-based spectral analysis Overall, due to its systematic and structural approach, HGSP of cascading failures over multilayer complex systems,” in Proc. 56th Annu. Allerton Conf. Commun. Control Comput. (Allerton), Oct. 2018, is expected to become an important tool in handling high- pp. 997Ð1004. dimensional signal processing tasks that are traditionally [11] Z. Huang, C. Wang, M. Stojmenovic, and A. Nayak, “Characterization addressed by DSP- or GSP-based methods. of cascading failures in interdependent cyberphysical systems,” IEEE Trans. Comput., vol. 64, no. 8, pp. 2158Ð2168, Aug. 2015. [12] T. Michoel and B. Nachtergaele, “Alignment and integration of complex networks by hypergraph-based spectral clustering,” Phys. Rev. E, Stat. VII. CONCLUSION Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 86, no. 5, Nov. 2012, Art. no. 056111. In this article, we proposed a novel tensor-based frame- [13] B. Oselio, A. Kulesza, and A. Hero, “Information extraction from large work of HGSP that generalizes the traditional GSP to high- multi-layer social networks,” in Proc. IEEE Int. Conf. Acoust. Speech dimensional hypergraphs. This article provided important def- Signal Process., Brisbane, QLD, Australia, Apr. 2015, pp. 5451Ð5455. [14] M. De Domenico et al., “Mathematical formulation of multilayer initions in HGSP, including hypergraph signals, hypergraph networks,” Phys.Rev.X, vol. 3, no. 4, Dec. 2013, Art. no. 041022. shifting, HGSP filters, frequency, and bandlimited signals. We [15] G. Ghoshal, V. Zlatic,« G. Caldarelli, and M. E. J. Newman, “Random presented basic HGSP concepts, such as the sampling theory hypergraphs and their applications,” Phys. Rev. E, Stat. Phys. Plasmas and filtering design. We show that hypergraph can serve as an Fluids Relat. Interdiscip. Top., vol. 79, no. 6, Jun. 2009, Art. no. 066118. [16] L. J. Grady and J. Polimeni, Discrete Calculus: Applied Analysis on efficient model for many complex data sets. We also illustrate Graphs for Computational Science. London, U.K.: Springer, 2010. multiple practical applications for HGSP in signal processing [17] C. G. Drake et al., “Analysis of the New Zealand black contribution and data analysis, where we provided numerical results to vali- to lupus-like renal disease. Multiple genes that operate in a threshold manner,” J. Immunol., vol. 154, no. 5, pp. 2441Ð2447, Mar. 1995. date the advantages and the practicality of the proposed HGSP [18] V. I. Voloshin, Introduction to Graph and Hypergraph Theory. framework. All the features of HGSP make it a powerful tool New York, NY, USA: Nova Sci., 2009. for IoT applications in the future. [19] C. Berge and E. Minieka, Graphs and Hypergraphs. Amsterdam, The Netherlands: North-Holland, 1973. Future Directions: With the development of tensor algebra [20] A. Banerjee, C. Arnab, and M. Bibhash, “Spectra of general hyper- and hypergraph spectra, more opportunities are emerging to graphs,” Linear Algebra Appl., vol. 518, pp. 14Ð30, Dec. 2016. explore HGSP and its applications. One interesting topic is [21] S. Kok and P. M. Domingos, “Learning Markov logic network structure via hypergraph lifting,” in Proc. 26th Annu. Int. Conf. Mach. Learn., how to construct the hypergraph efficiently, where distance- Montreal, QC, Canada, Jun. 2009, pp. 505Ð512. based and model-based methods have achieved significant [22] N. Yadati, M. Nimishakavi, P. Yadav, A. Louis, and P. Talukdar, successes in specific areas, such as image processing [93] and “HyperGCN: Hypergraph convolutional networks for semi-supervised classification,” arXiv:1809.02589 [cs], Sep. 2018. natural language processing [94]. Another promising direc- [23] A. Mithani, G. M. Preston, and J. Hein, “Rahnuma: Hypergraph- tion is to apply HGSP in analyzing and optimizing the based tool for metabolic pathway prediction and network comparison,” multilayer networks. As we discussed in the introduction, , vol. 25, no. 14, pp. 1831Ð1832, Jul. 2009. hypergraph is an alternative model to present the multilayer [24] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455Ð500, Aug. 2009. network [13], and HGSP becomes a useful tool when dealing [25] L. Qi, “Eigenvalues of a real supersymmetric tensor,” J. Symbolic with the multilayer structures. Other future directions include Comput., vol. 40, no. 6, pp. 1302Ð1324, Jun. 2005. the development of fast operations, such as the fast HGFT and [26] A. Afshar et al., “CP-ORTHO: An orthogonal tensor factorization frame- work for spatio-temporal data,” in Proc. 25th ACM SIGSPATIAL Int. applications over high-dimensional data sets [95]. Conf. Adv. Geograph. Inf. Syst., Redondo Beach, CA, USA, Jan. 2017, p. 67. [27] S. Chen, A. Sandryhaila, and J. Kovacevi« c,« “Sampling theory for graph REFERENCES signals,” in Proc. Acoust. Speech Signal Process. (ICASSP), Brisbane, QLD, Australia, Apr. 2015, pp. 3392Ð3396. [1] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on [28] S. Chen, R. Varma, A. Sandryhaila, and J. Kovacevi« c,« “Discrete signal graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644Ð1656, processing on graphs: Sampling theory,” IEEE Trans. Signal Process., Apr. 2013. vol. 63, no. 24, pp. 6510Ð6523, Dec. 2015. [2] A. Ortega, P. Frossard, J. Kovaceviˇ c,« J. M. F. Moura, and [29] B. W. Bader and T. G. Kolda. (Feb. 2015). MATLAB Tensor Toolbox P. Vandergheynst, “Graph signal processing: Overview, challenges, and Version 2.6. [Online]. Available: http://www.sandia.gov/∼tgkolda/ applications,” Proc. IEEE, vol. 106, no. 5, pp. 808Ð828, May 2018. TensorToolbox/ [3] M. Newman, D. J. Watts, and S. H. Strogatz, “ mod- [30] A. Bretto and L. Gillibert, “Hypergraph-based image representation,” in els of social networks,” Proc. Nat. Acad. Sci. USA, vol. 99, no. 1, Proc. Int. Workshop Graph Based Represent. Pattern Recognit., Berlin, pp. 2566Ð2572, Feb. 2002. Germany, Apr. 2005, pp. 1Ð11. ZHANG et al.: INTRODUCING HGSP: THEORETICAL FOUNDATION AND PRACTICAL APPLICATIONS 659

[31] A. Morar, F. Moldoveanu, and E. Gröller, “Image segmentation based [60] V. Rödl and J. Skokan, “Regularity lemma for k-uniform hypergraphs,” on active contours without edges,” in Proc. IEEE 8th Int. Conf. Random Struct. Algorithms, vol. 25, no. 1, pp. 1Ð42, Jun. 2004. Intell. Comput. Commun. Process., Cluj-Napoca, Romania, Sep. 2012, [61] Z. K. Zhang and C. Liu, “A hypergraph model of social tagging pp. 213Ð220. networks,” J. Stat. Mech. Theory Exp., vol. 2010, no. 10, Oct. 2010, [32] K. Fukunaga, S. Yamada, H. S. Stone, and T. Kasai, “A representation of Art. no. P10005. hypergraphs in the Euclidean space,” IEEE Trans. Comput.,vol.C-33, [62] K. J. Pearson and T. Zhang, “Eigenvalues on the adjacency tensor of no. 4, pp. 364Ð367, Apr. 1984. products of hypergraphs,” Int. J. Contemp. Math. Sci, vol. 8, no. 4, [33] S. Chen, A. Sandryhaila, J. M. F. Moura, and J. Kovaceviˇ c,« “Signal pp. 151Ð158, Jan. 2013. denoising on graphs via graph filtering,” in Proc. Signal Inf. Process. [63] J. M. Mendel, “Use of higher-order statistics in signal processing and (GlobalSIP), Atlanta, GA, USA, Dec. 2014, pp. 872Ð876. system theory: An update,” Adv. Algorithms Architect. Signal Process., [34] D. Dua and E. K. Taniskidou. (2017). UCI vol. 975, pp. 126Ð145, Feb. 1988. Repository. [Online]. Available: http://archive.ics.uci.edu/ml [64] J. M. Mendel, “Tutorial on higher-order statistics (spectra) in signal pro- [35] D. L. Medin and M. Schaffer, “Context theory of classification learning,” cessing and system theory: Theoretical results and some applications,” Psychol. Rev., vol. 85, no. 3, pp. 207Ð238, 1978. Proc. IEEE, vol. 79, no. 3, pp. 278Ð305, Mar. 1991. [36] X. Zhu and Z. Ghahramani, “Learning from labeled and unlabeled data [65] A. K. Nandi, Blind Estimation Using Higher-Order Statistics.NewYork, with label propagation,” School Comput. Sci., Carnegie Mellon Univ., NY, USA: Springer, 2013. Rep. CMU-CALD-02Ð107, 2002. [66] M. S. Choudhury, S. L. Shah, and N. F. Thornhill, “Diagnosis of poor [37] D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs: control-loop performance using higher-order statistics,” Automatica, Clustering, classification, and embedding,” in Proc. Adv. Neural Inf. vol. 40, no. 10, pp. 1719Ð1728, Oct. 2004. Process. Syst.. Sep. 2007, pp. 1601Ð1608. [38] E. O. Brigham and E. O. Brigham, The Fast Fourier Transform and Its [67] O. A. Dobre, Y. Bar-Ness, and W. Su, “Higher-order cyclic cumulants Proc. IEEE Mil. Commun. Applications. Englewood Cliffs, NJ, USA: Prentice Hall, 1988. for high order modulation classification,” in Conf. [39] X. Li, W. Hu, C. Shen, A. Dick, and Z. Zhang, “Context-aware hyper- , Boston, MA, USA, Oct. 2003, pp. 112Ð117. graph construction for robust spectral clustering,” IEEE Trans. Knowl. [68] M. B. Priestley, Spectral Analysis and Time Series. London, U.K.: Data Eng., vol. 26, no. 10, pp. 2588Ð2597, Oct. 2014. Academic, 1981. [40] K. Ahn, K. Lee, and C. Suh, “Hypergraph spectral clustering in the [69] A. Swami and J. M. Mendel, “Time and lag recursive computation weighted stochastic block model,” IEEE J. Sel. Topics Signal Process., of cumulants from a state-space model,” IEEE Trans. Autom. Control, vol. 12, no. 5, pp. 959Ð974, Oct. 2018. vol. 35, no. 1, pp. 4Ð17, Jan. 1990. [41] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation [70] M. Hein, S. Setzer, L. Jost, and S. S. Rangapuram, “The total varia- and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, tion on hypergraphs—Learning on hypergraphs revisited,” in Proc. Adv. pp. 53Ð65, Nov. 1987. Neural Inf. Process. Syst., Dec. 2013, pp. 2427Ð2435. [42] G. Li, L. Qi, and G. Yu, “The Z-eigenvalues of a symmetric tensor and [71] G. Chen, J. Zhang, F. Wang, C. Zhang, and Y. Gao, “Efficient multi- its application to spectral hypergraph theory,” Numer. Linear Algebra label classification with hypergraph regularization,” in Proc. IEEE Appl., vol. 20, no. 6, pp. 1001Ð1029, Mar. 2013. Conf. Comput. Vis. Pattern Recognit., Miami, FL, USA, Jun. 2009, [43] R. B. Bapat, Graphs and Matrices. London, U.K.: Springer, 2010. pp. 1658Ð1665. [44] K. C. Chang, K. Pearson, and T. Zhang, “On eigenvalue problems of real [72] V. I. Paulsen and R. R. Smith, “Multilinear maps and tensor norms symmetric tensors,” J. Math. Anal. Appl., vol. 305, no. 1, pp. 416Ð422, on operator systems,” J. Funct. Anal., vol. 73, no. 2, pp. 258Ð276, Oct. 2008. Aug. 1987. [45] P. Comon and B. Mourrain, “Decomposition of quantics in sums of [73] S. Chen, A. Sandryhaila, J. M. F. Moura, and J. Kovaceviˇ c,« “Adaptive powers of linear forms,” Signal Process., vol.53, nos. 2Ð3, pp. 93Ð107, graph filtering: Multiresolution classification on graphs,” in Proc. Sep. 1996. IEEE Glob. Conf. Signal Inf. Process., Austin, TX, USA, Feb. 2014, [46] P. Comon, G. Golub, L. H. Lim, and B. Mourrain, “Symmetric tensors pp. 427Ð430. and symmetric tensor rank,” SIAM J. Matrix Anal. Appl., vol. 30, no. 3, [74] S. Chen, F. Cerda, P. Rizzo, J. Bielak, J. H. Garrett, and J. Kovaceviˇ c,« pp. 1254Ð1279, Sep. 2008. “Semi-supervised multiresolution classification using adaptive graph fil- [47] H. A. L. Kiers, “Towards a standardized notation and terminology in tering with application to indirect bridge structural health monitoring,” multiway analysis,” J. Chemometr. Soc., vol. 14, no. 3, pp. 105Ð122, IEEE Trans. Signal Process., vol. 62, no. 11, pp. 2879Ð2893, Mar. 2014. Jan. 2000. [75] S. McCanne and M. J. Demmer, “Content-based segmentation scheme [48] R. A. Horn, “The Hadamard product,” in Proc. Symp. Appl. Math, for data compression in storage and transmission including hierarchical vol. 40, Apr. 1990, pp. 1Ð7. segment representation,” U.S. Patent, 6 667 700, Dec. 2003. [49] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos, “Tag recommen- [76] K. Sayood, Introduction to Data Compression. Burlington, MA, USA: dations based on tensor dimensionality reduction,” in Proc. ACM Conf. Morgan Kaufmann, 2017. Recommender Syst., Lausanne, Switzerland, Oct. 2018, pp. 43Ð50. [77] D. Salomon, Data Compression: The Complete Reference. London, [50] S. Liu and G. Trenkler, “Hadamard, Khatri-Rao, Kronecker and other U.K.: Springer, 2004. Int. J. Inf. Syst. Sci. matrix products,” , vol. 4, no. 1, pp. 160Ð177, 2008. [78] M. Yan et al., “Hypergraph-based data link layer scheduling for reliable [51] K. J. Pearson and T. Zhang, “On spectral hypergraph theory of packet delivery in wireless sensing and control networks with end-to-end Graphs Comb. the adjacency tensor,” , vol. 30, no. 5, pp. 1233Ð1248, delay constraints,” Inf. Sci., vol. 278, pp. 34Ð55, Sep. 2014. Sep. 2014. [79] C. Adjih, S. Y. Cho, and P. Jacquet, “Near optimal broadcast with [52] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on network coding in large sensor networks,” arXiv:0708.0975 [cs. NI], graphs: Graph Fourier transform,” in Proc. IEEE Int. Conf. Acoust. Aug. 2007. Speech Signal Process., Vancouver, BC, Canada, May 2013, pp. 26Ð31. [53] M. A. Henning and Y. Anders, “2-Colorings in k-regular k-uniform [80] V. Zlatic,« G. Ghoshal, and G. Caldarelli, “Hypergraph topological quanti- hypergraphs,” Eur. J. Comb., vol. 34, no. 7, pp. 1192Ð1202, Oct. 2013. ties for tagged social networks,” Phys. Rev. E, Stat. Phys. Plasmas Fluids [54] E. Robeva, “Orthogonal decomposition of symmetric tensors,” SIAM J. Relat. Interdiscip. Top., vol. 80, no. 3, Sep. 2009, Art. no. 036118. Matrix Anal. Appl., vol. 37, no. 1, pp. 86Ð102, Jan. 2016. [81] I. Konstas and M. Lapata, “Unsupervised concept-to-text generation [55] A. Cichocki et al., “Tensor decompositions for signal processing appli- with hypergraphs,” in Proc. Conf. North Amer. Assoc. Comput. Linguist. cations: From two-way to multiway component analysis,” IEEE Signal Human Lang. Technol., Montreal, QC, Canada, Jun. 2012, pp. 752Ð761. Process. Mag., vol. 32, no. 2, pp. 145Ð163, Mar. 2015. [82] H. Boley, “Directed recursive labelnode hypergraphs: A new [56] D. Silva and A. Pereira, Tensor Techniques for Signal Processing: representation-language,” Artif. Intell., vol. 9, no. 1, pp. 49Ð85, Algorithms for Canonical Polyadic Decomposition, Univ. Grenoble Aug. 1977. Alpes, Grenoble, France, 2016. [83] F. M. Harper and J. A. Konstan, “The MovieLens datasets: History [57] V. Nguyen, K. Abed-Meraim, and N. Linh-Trung, “Fast tensor decom- and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, p. 19, positions for big data processing,” in Proc. Int. Conf. Adv. Technol. Jan. 2016, doi: 10.1145/2827872. Commun., Hanoi, Vietnam, Oct. 2016, pp. 215Ð221. [84] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on [58] T. Kolda, “Symmetric orthogonal tensor decomposition is trivial,” graphs: Graph filters,” in Proc. IEEE Int. Conf. Acoust. Speech Signal arXiv:1503.01375 [math. NA], Mar. 2015. Process., Vancouver, BC, Canada, May 2013, pp. 6163Ð6166. [59] S. Mallat, A Wavelet Tour of Signal Processing, 3rd ed. New York, NY, [85] L. Qi, “Eigenvalues and invariants of tensors,” J. Math. Anal. Appl., USA: Academic, 2008. vol. 325, no. 2, pp. 1363Ð1377, Jan. 2007. 660 IEEE INTERNET OF THINGS JOURNAL, VOL. 7, NO. 1, JANUARY 2020

[86] L.-H. Lim, “Singular values and eigenvalues of tensors: A variational Zhi Ding (S’88–M’90–SM’95–F’03) received the approach,” in Proc. 1st IEEE Int. Workshop Comput. Adv. Multi Sensor Ph.D. degree in electrical engineering from Cornell Adapt. Process., Puerto Vallarta, Mexico, Dec. 2005, pp. 129Ð132. University, Ithaca, NY, USA, in 1990. [87] M. Ng, L. Qi, and G. Zhou, “Finding the largest eigenvalue of a nonneg- He is a Professor of electrical and computer engi- ative tensor,” SIAM J. Matrix Anal. Appl., vol. 31, no. 3, pp. 1090Ð1099, neering with the University of California at Davis, Feb. 2009. Davis, CA, USA. From 1990 to 2000, he was a [88] D. Cartwright and B. Sturmfels, “The number of eigenvalues of a tensor,” Faculty Member with Auburn University, Auburn, Linear Algebra Appl., vol. 438, no. 2, pp. 942Ð952, Jan. 2013. AL, USA, and later, University of Iowa, Iowa City, [89] M. N. Marshall, “Sampling for qualitative research,” Family Practice, IA, USA. He has held visiting positions in Australian vol. 13, no. 6, pp. 522Ð526, Dec. 1996. National University, Canberra, ACT, Australia; the [90] G. E. Batley and D. Gardner, “Sampling and storage of natural waters Hong Kong University of Science and Technology, for trace metal analysis,” Water Res., vol. 11, no. 9, pp. 745Ð756, 1977. Hong Kong; NASA Lewis Research Center, Cleveland, OH, USA; and [91] U. V. Luxburg, “A tutorial on spectral clustering,” Stat. Comput., vol. 17, USAF Wright Laboratory, New Haven, CT, USA. He has active collabora- no. 4, pp. 395Ð416, Dec. 2007. tion with researchers from universities in Australia, Canada, China, Finland, [92] J. Jung, S. Chun, and K.-H. Lee, “Hypergraph-based overlay network Hong Kong, Japan, South Korea, Singapore, and Taiwan. He has coauthored model for the Internet of Things,” in Proc. IEEE 2nd World Forum the book titled: Modern Digital and Analog Communication Systems, (Oxford Internet Things (WF-IoT), Milan, Italy, Dec. 2015, pp. 104Ð109. University Press, 2019, 5th edition). [93] J. Yu, D. Tao, and M. Wang, “Adaptive hypergraph learning and its appli- Prof. Ding was a recipient of the IEEE Communication Society’s WTC cation in image classification,” IEEE Trans. Image Process., vol. 21, Award in 2012. He has been serving on technical programs of several no. 7, pp. 3262Ð3272, Jul. 2012. workshops and conferences. He was an Associate Editor of the IEEE [94] S. Rital, H. Cherifi, and S. Miguet, “Weighted adaptive neighbor- TRANSACTIONS ON SIGNAL PROCESSING from 1994 to 1997 and from hood hypergraph partitioning for image segmentation,” in Proc. Int. 2001 to 2004 and IEEE SIGNAL PROCESSING LETTERS from 2002 to Conf. Pattern Recognit. Image Anal., Berlin, Germany, Aug. 2005, 2005. He was a member of technical committee on Statistical Signal pp. 522Ð531. and Array Processing and technical committee on Signal Processing for [95] R. B. Rusu and S. Cousins, “3D is here: Point cloud library (PCL),” Communications from 1994 to 2003. He was the General Chair of the 2016 in Proc. IEEE Int. Conf. Robot. Autom., Shanghai, China, May 2011, IEEE International Conference on Acoustics, Speech, and Signal Processing pp. 1Ð4. and the Technical Program Chair of the 2006 IEEE Globecom. He was also an IEEE Distinguished Lecturer (Circuits and Systems Society, from 2004 to 2006; Communications Society, from 2008 to 2009). He served on the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS Steering Committee Member from 2007 to 2009 and its Chair from 2009 to 2010.

Shuguang Cui (S’99–M’05–SM’12–F’14) received the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2005. He has been an Assistant, an Associate, a Full, and the Chair Professor with the Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA; Texas A&M University, Uvalde, TX, USA; University of California at Davis, Davis, CA, USA; and the Chinese University of Hong Kong, Shenzhen. He has also been the Vice Director with the Shenzhen Research Institute of Big Data, Shenzhen, China. His current research interests include data driven large-scale system control and resource management, large data set analy- sis, Internet of Things system design, energy harvesting-based communication system design, and cognitive network optimization. Prof. Cui was a recipient of the IEEE Signal Processing Society 2012 Best Paper Award, the Amazon AWS Machine Learning Award in 2018, and the Highly Cited Researcher Award by Thomson Reuters. He was listed in the Worlds’ Most Influential Scientific Minds by ScienceWatch in 2014. He has served as the general co-chair and the TPC co-chair for many IEEE con- ferences. He has also been serving as the Area Editor for the IEEE Signal Processing Magazine, and an Associate Editor for the IEEE TRANSACTIONS ON BIG DATA, the IEEE TRANSACTIONS ON SIGNAL PROCESSING,the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS Series on Songyang Zhang received the Bachelor of Green Communications and Networking, and the IEEE TRANSACTIONS ON Science degree from the Department of Electronic WIRELESS COMMUNICATIONS. He has been the Elected Member for IEEE Engineering and Information Science, University of Signal Processing Society SPCOM Technical Committee from 2009 to 2014 Science and Technology of China, Hefei, China. and the Elected Chair for IEEE ComSoc Wireless Technical Committee He is currently pursuing the Ph.D. degree with the from 2017 to 2018. He is a member of the Steering Committee for Department of Electrical and Computer Engineering, the IEEE TRANSACTIONS ON BIG DATA and the Chair of the Steering University of California at Davis, Davis, CA, USA. Committee for IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS His current research interests include hypergraph AND NETWORKING. He was also a member of the IEEE ComSoc Emerging signal processing, behavior analysis over high- Technology Committee. He was elected as an IEEE ComSoc Distinguished dimensional graphs, and learning over graphs. Lecturer in 2014 and IEEE VT Society Distinguished Lecturer in 2019.