Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

Dynamic Hypergraph Structure Learning

Zizhao Zhang, Haojie Lin, Yue Gao∗ BNRist, KLISS, School of Software, Tsinghua University, China. [email protected], [email protected], [email protected]

Abstract 1 Introduction Hypergraph is composed of a set and a hyperedge set. In recent years, hypergraph modeling has shown In a hypergraph, each hyperedge can connect any number of its superiority on correlation formulation among vertices, and the hyperedge can be easily expanded, which samples and has wide applications in classifica- leads to a flexible edge for hypergraph. Therefore, the tion, retrieval, and other tasks. In all these works, hypergraph model is able to formulate the high-order corre- the performance of hypergraph learning highly de- lation of data compared with simple graph and other linear pends on the generated hypergraph structure. A comparison methods. good hypergraph structure can represent the data Due to this advantage, hypergraph learning method [Zhou correlation better, and vice versa. Although hy- and Scholkopf,¨ 2007] has attracted much attention in recent pergraph learning has attracted much attention re- years and has been applied in many tasks [Zhang et al., 2017; cently, most of existing works still rely on a static Zhu et al., 2017b], such as image retrieval [Huang et al., hypergraph structure, and little effort concentrates 2010; Zhu et al., 2017a], analysis [Zhao et on optimizing the hypergraph structure during the al., 2018], object classification [Gao et al., 2012], image seg- learning process. To tackle this problem, we pro- mentation [Huang et al., 2009], hyperspectral image analysis pose the first dynamic hypergraph structure learn- [Luo et al., 2018], person re-identification [An et al., 2017], ing method in this paper. In this method, given the multimedia recommendation [Zhu et al., 2015] and visual originally generated hypergraph structure, the ob- tracking [Du et al., 2017]. In these applications, usually the jective of our work is to simultaneously optimize relationship of the data for processing is formulated in a hy- the label projection matrix (the common task in hy- pergraph structure, where each vertex denotes the data and pergraph learning) and the hypergraph structure it- the hyperedges represent some kind of connections. Then, self. On one hand, the label projection matrix is the hypergraph learning is conducted as a label propagation related to the hypergraph structure in our formula- process on the hypergraph to obtain the label projection ma- tion, similar to traditional methods. On the other trix [Liu et al., 2017a] or as a [Li and hand, different from the existing hypergraph gener- Milenkovic, 2017] in different tasks. ation methods based on the feature space only, the In these methods, the quality of the hypergraph structure hypergraph structure optimization in our formula- plays an important role for data modelling. A well con- tion utilizes the data correlation from both the la- structed hypergraph structure can represent the data correla- bel space and the feature space. Then, we alterna- tion accurately, yet leading to better performance, while an tively learn the optimal label projection matrix and inappropriate hypergraph structure could make the situation the hypergraph structure, leading to a dynamic hy- worse. To generate a better hypergraph structure, many ap- pergraph structure during the learning process. We proaches have been proposed in recent years, such as k-nn have applied the proposed method in the tasks of method [Huang et al., 2009], clustering-based method [Gao 3D shape recognition and gesture recognition. Ex- et al., 2012], spare representation method [Liu et al., 2017b], perimental results on four public datasets show bet- etc. In existing methods, after the hypergraph has been con- ter performance compared with the state-of-the-art structed, it never changes during the learning process, leading methods. We note that the proposed method can be to a static hypergraph structure learning mechanism. How- further applied in other tasks. ever, it is uneasy to guarantee that the generated hypergraph structure is optimal and suitable for all applications. In recent years, there have been some attempts to mod- ify the hypergraph components, such as learning the hy- ∗indicates corresponding author peredge weights [Gao et al., 2012] and the sub-hypergraph

3162 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) weights [Gao et al., 2013]. However, these methods only tensive evaluations to demonstrate the effectiveness of make slightly maintenance on the original hypergraph struc- the proposed method. We have observed that traditional ture, and cannot repair the connections which are inappro- hypergraph hyperedge weight updating method has the priate or even contain errors. Confronting this challenge, it limitation on improving the hypergraph learning perfor- is urgent to investigate the hypergraph structure optimization mance, and the dynamic updating of hypergraph struc- during the learning process, leading to a dynamic hypergraph ture has shown consistent performance improvement. structure learning scheme. The rest of this paper is organized as follows. Section 2 In this paper, we propose a dynamic hypergraph struc- introduces the related work on hypergraph learning. Section ture learning method, which can jointly learn the hypergraph 3 presents the proposed dynamic hypergraph structure learn- structure and label projection matrix simultaneously. In a hy- ing method. The applications and experimental results are pergraph, its structure is represented by the matrix, provided in Section 4. We conclude this paper in Section 5. which records the connections and the connection strength between vertices and hyperedges. The objective here is to op- timize the during the learning process, yet 2 Related Work leading to a better hypergraph structure accordingly. Given In this section, we briefly review existing works on hyper- the data, an initial hypergraph structure can be generated graph learning and its applications, such as video object seg- by selecting a suitable hypergraph construction method for mentation, scene recognition, 3D object retrieval and classifi- the application. In our formulation, we propose to optimize cation. the hypergraph structure from two spaces, i.e., the data label In hypergraph learning, hypergraph construction is the first space and the data feature space. With the initial hypergraph, step for data modeling. Existing works can be divided into a transductive learning process can be conducted to propagate feature-based methods and representation-based methods. In the vertex labels on the hypergraph. In each round, after the feature-based methods, the hyperedge targets on exploring optimization of the label projection matrix, we further update the nearest neighbors in the feature space, based on k-nn or a the incidence matrix considering the data correlation on both search radius. The k-nn scheme [Huang et al., 2009] selects the label space and the feature space. This process repeats a set number of nearest neighbors for each vertex to generate until both the hypergraph structure and the label projection a hyperedge. The search radius scheme, while, sets a pre- matrix are stable. During this process, the hypergraph struc- defined search radius and all vertices in the radius are con- ture can be dynamically updated, leading to a dynamic hy- nected by one hyperedge. However, it is challenging to de- pergraph strategy. In this way, we can achieve both optimal termine an optimal number of nearest neighbors or the search label projection matrix (the objective for the application) and radius, which may be sensitive to noise and limit the perfor- updated hypergraph structure (the data correlation modelling) mance of data modeling. Different from the feature-based under a dual optimization framework. methods, representation-based method [Liu et al., 2017b] We have applied our dynamic hypergraph structure learn- aims to exploit the relation among vertices through feature ing method on the tasks of 3D shape recognition and ges- reconstruction. Given a centroid vertex, the sparse represen- ture recognition. For object classification, experiments have tation was conducted to represent the centroid vertex by its been conducted on two public benchmarks, i.e., the Na- neighbors, and then only the vertices with non-zero coeffi- tional Taiwan University (NTU) 3D shape dataset [Chen cients to the centroid vertex were used to construct the hy- et al., 2003] and the Engineering Shape Benchmark (ESB) peredge. To deal with multi-modal data, multi-hypergraph [Jayanti et al., 2006]. For gesture recognition, experiments structure [Gao et al., 2012; Zhu et al., 2015] has also been have been conducted on the public MSR Gesture 3D dataset introduced corresponding to different modalities or features. (MSRGesture3D) and a motion gesture dataset collected In [Zhou and Scholkopf,¨ 2007], hypergraph learning was by Huazhong University of Science and Technology (Ges- introduced as a propagation process through the hypergraph ture3DMotion). Experimental results demonstrate better per- structure. The transductive inference on hypergraph aims formance of the proposed method compared with traditional to minimize the label differences between vertices which hypergraph learning methods and other state-of-the-art meth- are connected by more and stronger hyperedges. To fur- ods, indicating the better data correlation modeling ability. ther improve the hypergraph structure, recent attention has The main contributions of our work are two-fold: been spent on learning the weights of hyperedges, where the hyperedges could have different importance on data model- 1. We propose the first dynamic hypergraph structure learn- ing. To learn optimal hyperedge weights, a ` regularizer on ing method. To the best of our knowledge, this is the first 2 the weights was introduced in [Gao et al., 2013]. Hwang attempt to jointly conduct hypergraph structure updating et al. . [Hwang et al., 2008] further considered the correla- and hypergraph learning, which is accomplished by a tion among hyperedges by assuming that highly correlated dual optimization process. For hypergraph structure op- hyperedges should share similar weights. Regarding multi- timization, we propose to update the hypergraph struc- hypergraph, a sub-hypergraph weight learning method was ture from both the label space and the feature space, dif- introduced in [Gao et al., 2012] to assign weights for differ- ferent from traditional methods which are mainly based ent sub-hypergraphs. We note that although these works can on the feature space. update the hyperedge weights or the sub-hypergraph weights, 2. We have applied the proposed method on 3D shape they cannot repair the connections which are inappropriate or recognition and gesture recognition and conducted ex- even contain errors because the hypergraph structure, i.e., the

3163 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) incidence matrix is static in all these works. weights. In V, each vertex denotes one sample in {Q, D}, Hypergraph learning has been widely applied in many and there are n + m vertices in total. In this paper, we take computer vision tasks. In [Huang et al., 2010], the hyper- the k-nn hyperedge generation method as the example. We graph structure was employed to formulate the relationship note that other hypergraph generation methods can be also among images based on visual features for image retrieval. used in our framework. In E, the hyperedges are generated The hypergraph structure was used for video object segmen- following a k-nn approach. Each time, one vertex is selected tation [Huang et al., 2009], where each vertex denoted over- as the centroid and a corresponding hyperedge e connects the segmented image patched and the spatial-temporal neighbor- centroid and its k nearest neighbor vertices. The incidence hood correlation was used for hypergraph construction. Hy- matrix H of hypergraph G = (V, E, W) represents the con- pergraph learning has also been used for image classification nections between the vertices and the hyperedges, which can and ranking [Gao et al., 2013; Zhu et al., 2015], and 3D ob- be generated as ject classification [Gao et al., 2012]. (  2  exp − d(v,vc) if v ∈ e We can notice that existing hypergraph learning methods H (v, e) = dˆ2 , (1) are based on a fixed (static) hypergraph structure. These 0 if v∈ / e works either focus on the learning of label projection ma- where d(v, vc) is the distance between a vertex v and the trix or the learning of hyperedge/hypergraph weights. Given ˆ an inaccurate initial hypergraph, it is impossible for exist- centroid vertex vc in hyperedge e. and d is the average dis- ing methods to obtain an optimal data modelling. Therefore, tance between subjects. Accordingly, two diagonal matrices static hypergraph learning still has the limitation on complex Dv and De can be generated with every entry along the di- data correlation modelling. agonal corresponds to the vertex degree and hyperedge de- gree, respectively. They can be written in the matrix form as T Dv = diag(HW1) and De = diag(1 H), respectively. All 3 Dynamic Hypergraph Structure Learning the hyperedges are given equal weights here. In this section, we detailed introduce the proposed dynamic hypergraph structure learning algorithm. Figure 1 illustrates 3.2 The Formulation of Dynamic Hypergraph the general framework of our proposed method. Given the Structure Learning data with corresponding features, our objective is to learn In a general setting of hypergraph learning, the label projec- an optimal hypergraph structure, which could better explore tion matrix is the learning target given the initial labels of the the high-order relationship among the data, and also the label hypergraph structure. project matrix, which is used for discriminating the testings to Here we let Y = {y1, y2,..., yn+m} and yi denotes the different categories. Firstly, a hypergraph can be generated to initial label vector of the i-th sample (vertex). For labeled formulate the data correlation. We note that any hypergraph samples, the j-th element is 1 if the i-th sample belongs to construction method can be used in our framework, and we the j-th class, and it is 0 otherwise. For unlabeled samples, take the k-nn scheme as an example in this paper. Given the all elements are set to 0.5, indicating that we have no prior in- initial hypergraph, we then jointly learn the label projection formation about the of these samples. Let fi denotes matrix and the hypergraph structure under a dual optimiza- the to-be-learned label projection vector of the i-th sample tion framework. In each round, a transductive learning pro- and F = {f1, f2,..., fn+m} is the label project matrix. cess can be conducted to propagate the vertex labels on the The joint learning of the label projection matrix F and the hypergraph. After the label projection matrix has been ob- hypergraph structure, i.e., the incidence matrix H, should sat- tained, we further learn the incidence matrix considering the isfy the following three conditions. data correlation on both the label space and the feature space. First, for the label projection matrix F, it should be smooth This process repeats until the objective function doesn’t de- on the hypergraph structure H. The more and stronger con- scend. During this process, the hypergraph structure can be nections between two vertices, the higher probabilities that dynamically updated. We then have both optimal label pro- they share similar label projection vector. This is a commonly jection matrix and updated hypergraph structure. The former used hypergraph regularizer and it is related to both F and H. one can be used for data classification, retrieval and other ap- It can be written as plications. Ψ(F, H) 2 nc   1 P P P W(e)H(u,e)H(v,e) √F(u,c) √F(v,c) 3.1 Hypergraph Construction = 2 δ(e) − c=1 e∈E u,v∈V d(u) d(v) For any application using hypergraph learning methods, the nc  2  P P P W(e)H(u,e)H(v,e) F(u,c) F√(u,c)F(v,c) first step is to construct the corresponding hypergraph struc- = δ(e) d(u) − c=1 e∈E u,v∈V d(u)d(v) ture. Given the data, including n labeled training samples n c 2 W(e)H(u,e) H(v,e) Q = {S1, S2,..., Sn} and m unlabeled testing samples P P P P = F (u, c) d(u) δ(e) D = {Sn+1, Sn+2,..., Sn+m}, here let xi denote the fea- c=1 u∈V e∈E v∈V F(u,c)H(u,e)W(e)H(v,e)F(v,c) ture vector of the i-th sample and X = {x1, x2,..., xn+m}. − P P √ A hypergraph G = (V, E, W) can be constructed to formu- e∈E u,v∈V d(u)d(v)δ(e)    − 1 − 1  late the relationship among these samples, where V is the set = tr I − D 2 HWD−1HTD 2 FFT , of vertices, E is the set of hyperedges and W is a diagonal v e v matrix where each diagonal element denotes the hyperedge (2)

3164 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

Figure 1: The framework of our proposed dynamic hypergraph structure learning method.

− 1 −1 T − 1 where nc is the number of categories. where ∆ = I − De 2 HWDe H Dv 2 can be regarded Second, for the hypergraph structure H, it should also be as the normalized hypergraph Laplacian in a matrix form. Ac- smooth on the data from both the label space and the feature cording to [Zhou and Scholkopf,¨ 2007], F can be obtained space. In the label space, the regularizer on H is the same directly as as Eq. (2). In the feature space, the hypergraph structure H  1 −1 F = I + ∆ Y. (7) should also preserve the smoothness for the input features X. λ Similar to Eq. (2), the constraint of H on X can be written as Optimization on H    − 1 − 1  2 −1 T 2 T Then, we fix F and optimize H, and the learning objective Ω(H) = tr I − Dv HWD H Dv XX . (3) e function can be written as The third condition is the empirical loss, which comes from arg min Q(H) = Ψ (H) + βΩ(H) 0H1 F and Y and can be written as  1 1   (8) − 2 −1 T − 2 2 = tr I − Dv HWDe H Dv K Remp(F) = kF − YkF . (4) By summarizing the above optimization requirement, the where K = FFT + βXXT. Note that Eq. (8) is a bound objective function for dynamic hypergraph structure learning constraint optimization problem and it can be solved by using can be written as the following dual optimization problem: the projected gradient method. The derivation with respect to H can be written as: arg min Q (F, H) = Ψ (F, H) + βΩ(H) + λRemp(F) F,0H1  1 1  T − 2 − 2 −2 ∇Q(H) = J I ⊗ H Dv KDv H WDe  1 1   − 2 −1 T − 2 T T = tr − D HWD H D FF + βXX 3 1 I v e v − −1 T − (9) + Dv 2 HWDe H Dv 2 KJW 2 − 1 − 1 −1 + λkF − YkF , − 2Dv 2 KDv 2 HWDe , (5) where J = 11T. The detailed derivation is included in Ap- where β and λ are parameters to balance different compo- pendix A for reference. Then, H can be updated by iterating nents in the objective function. Hk+1 = P [Hk − α∇Q(Hk)], (10) 3.3 Solution where α is the optimal step size at each iteration and P is the Here, we optimize the objective function in Eq.(5) to learn projection onto the feasible set {H|0  H  1}. Here we set the label projection matrix F and the hypergraph structure P as H simultaneously. We employ the alternative optimization ( hij if 0 ≤ hij ≤ 1 scheme to update each variable when fixing the other until P [hij ] = 0 if hij < 0 . (11) the objective function doesn’t descend. 1 if hij > 1 Optimization on F We note that other selection of P can be also used in our framework. We first fix the incidence matrix H, and the sub-problem on With the updated H, we can further optimize the label pro- optimizing F can be written as jection matrix F. In this way, we can alternatively optimize

arg min Q (F) = Ψ (F) + λRemp(F) F and H until the objective function doesn’t descend. The F , (6) overall workflow of the proposed dynamic hypergraph struc- = tr ∆FFT + λkF − Yk2 ture learning method is shown in Algorithm 1.

3165 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

Algorithm 1 Dynamic Hypergraph Structure Learning 10 times and the average recognition performance is reported. Input: training data set Q, testing data set D, maximal iter- In these experiments, we empirically set the parameter β and ation k, and parameters β and λ λ as 10 and 1 on both datasets, respectively. Output: label projection matrix F and hypergraph structure In experiments, the recent 3D shape recognition method, H i.e., multi-view convolutional neural networks (MVCNN) [Su ] 1: Construct the initial hypergraph G = (V, E, W). et al., 2015 is used for object feature extraction method 2: Initialize the learning rate α. and also for comparison. In MVCNN, 12 different views 3: for i = 0 → k − 1 do of each 3D shape are used for feature extraction, and it has 4: Fix H, and update F by Eq. (6). shown the state-of-the-art performance of 3D shape recogni- 5: Fix F, and update H by Eq. (8). tion. In this task, we also compare our proposed dynamic hy- 6: end for pergraph structure learning method (DHSL) with the graph- [ ] 7: return F, H based learning (GL) Zhou et al., 2003 , the traditional hy- pergraph learning (HL) [Zhou and Scholkopf,¨ 2007] and the hypergraph learning with hyperedge weight learning (HL-W) 4 Applications and Experiments [Gao et al., 2013]. In GL, the normalized graph laplacian is explored. The radius parameter in regularization is tuned to In this section, we apply our proposed dynamic hypergraph the optimal value. In HL, the hypergraph is constructed in the structure learning method in two tasks of 3D object classi- same way as the proposed method, and traditional hypergraph fication and gesture recognition. We conduct experiments learning is conducted for shape recognition. In HL-W, the hy- and comparisons on two 3D object datasets and two gesture peredge weights are also learned during the learning process. datasets to evaluate the performance, respectively. Note that GL, HL, HL-W and DHSL also use the MVCNN features as the shape features for hypergraph construction. 4.1 3D Shape Recognition Experimental results on the two datasets are demonstrated In 3D shape recognition, given a set of 3D shapes for the in Figure 2. From these results, we can have the following training and the testing data, the shape features are first observations: extracted. A hypergraph structure can be constructed and the dual optimization procedure can be conducted for shape 1. Compared with the state-of-the-art method MVCNN, recognition. The shape will be categorized to the category DHSL can achieve better performance on both datasets. based on the label projection matrix. For example, when 5%, 10%, 20% and 30% data are To evaluate the performance of the proposed method on used for training, DHSL achieves gains of 10.69%, 3D shape recognition, we have conducted experiments on the 10.5%, 6.45%, and 4.73% compared with MVCNN on National Taiwan University 3D shape dataset (NTU) [Chen the NTU dataset. et al., 2003] and the Engineering Shape Benchmark (ESB) 2. Compared with traditional learning methods, DHSL [Jayanti et al., 2006]. In the NTU dataset, 2020 3D shapes achieves better performance on both datasets. For exam- from 67 categories are selected, such as bomb, bottle, car, ple, compared with GL, HL and HL-W, DHSL achieve chair, cup, door, map, plane, sword, table, tank, and truck. In gains of 7.71%, 4.12% and 2.1% with 10% training data the ESB dataset, 866 shapes from 43 categories are included, on the NTU dataset. such as back doors, bracket like parts, clips, contact switches, curved housings, thin plates, and bearing blocks. 4.2 Gesture Recognition In the task of gesture recognition, a gesture is represented by a depth sequence captured by depth sensors. Gesture recog- nition refers to classify the depth sequences into existing cat- egories. Given a set of depth sequences, we first extract ges- ture features, and we then construct a hypergraph to model the high-order relationship among the data. After that, our method can conduct learning for gesture recognition. To validate the proposed DHSL method on gesture recog- nition, we have conducted experiments on the MSR ges- ture 3D dataset (MSRGesture3D) [Wang et al., 2012] and a hand gesture dataset collected by Huazhong University of (a) The NTU Dataset (b) The ESB Dataset Science and Technology (Gesture3DMotion). In MSRGes- Figure 2: Experimental comparison of different methods on view- ture3D dataset, 10 subjects perform 12 kinds of gestures de- based object classification. fined by American Sign Language (ASL), including bath- room, blue, finish, green, hungry, milk, past, pig, store, where, In our experiments, a set of data are selected for training j and z. This dataset consists of 333 depth sequences. In Ges- and all other data are used for testing. The training data varies ture3DMotion dataset, 4 subjects perform 12 kinds of rotation from 5% to 30% of all the data for each category respectively hand gestures and contour area changing hand gestures, in- (note that for each category, at least one sample is selected for cluding closed rotation, win rotation, one circle, opened rota- training). This training/testing data selection process repeats tion, eight rotation, six rotation, closed opened, win transla-

3166 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) 4.3 Discussions In both applications, we can notice that all hypergraph learning methods can achieve better performance compared with the original state-of-the-art methods, i.e., MVCNN and HON4D, which demonstrates the effectiveness of the hyper- graph modelling. Compared with HL, HL-W achieves slightly improvements on the NTU dataset, while is with almost the same perfor- mance on the other three datasets (the lines in Fig. 2 and (a) The MSRGesture3D Dataset (b) The Gesture3DMotion Fig. 3 are overlapped). This phenomenon indicates that the Dataset learning of hyperedge weights is not effective on improving the hypergraph structure. Different from HL-W, the proposed Figure 3: Experimental comparison of different methods on gesture recognition. dynamic hypergraph structure learning methods can outper- form static hypergraph learning methods on two tasks and four datasets consistently. tion, one wave, three rotation, one rotation and little rotation. The better performance of DHSL can be dedicated to two This dataset is composed of 384 depth sequences captured by reasons. First, the proposed method introduces a dynamic Kinect-2. hypergraph structure to formulate data correlations. Such dy- In our experiments, a set of data are selected for training namic hypergraph structure can better fit the data, compared and all other data are used for testing. The training data with traditional static hypergraph structure. Second, the hy- varies from 1 to 8 for each category respectively. This train- pergraph structure is optimized to be smooth in both the label ing/testing data selection process repeats 10 times and the av- space and the feature space in DHSL. In this way, the con- erage classification performance is reported. In these exper- nections among the vertices with the same label and similar iments, we also empirically set the parameter β and λ as 10 features can be more and stronger than those with different la- and 1 on both datasets, respectively. bels or dissimilar features. This dynamic hypergraph strategy In experiments, the recent gesture recognition method, i.e., and the corresponding optimization method can lead to better HON4D [Oreifej and Liu, 2013] is used for gesture feature performance compared with traditional hypergraph learning extraction method and also for comparison. HON4D is able methods. to capture motion and geometry cues jointly in the 4D space (depth, time, and spatial coordinates). Similar to the 3D shape 4.4 On Parameters recognition experiments, we also compare DHSL with GL, In our framework, there are two important parameters that HL and HL-W with the HON4D features. control the relative impact of different components in the ob- Experimental results on the two datasets are demonstrated jective function, i.e., β and λ, where β is for the smoothness in Figure 3. From these results, we can have the following constraint in the feature space and λ is for the empirical loss. observations: We first keep λ as 1 and vary β from 0 to 100, and evalu- 1. The proposed DHSL outperforms the state-of-the-art ate the influence of different β selections on the experimen- method HON4D on both datasets consistently. For ex- tal performance. Experimental results on four datasets are ample, when 1 and 2 training data are employed for demonstrated in Fig. 4. From these results, we can observe each gesture, the gains are 26.74% and 13.51% on the that when β is too small, such as β = 0, the performance is MSRGesture3D dataset. worse than that with a higher β value. When β is too small, the smoothness constraint in the feature space will have little 2. DHSL performs better than HL in all cases. For exam- influence on the learning process. When β varies in a large ple, on the MSRGesture3D dataset, DHSL obtains gains range, such as [5, 100], the performance is stable. This re- of 15.13%, 5.16% and 5.16% with 3 training data per sult indicates that 1) the use of the smoothness constraint in gesture compared with GL, HL and HL-W, respectively. the feature space is useful to guide the learning of hypergraph structure, and 2) the proposed method is not sensitive to the selection of β. We then keep β as 10 and vary λ from 0.01 to 1000 to evaluate the influence of different λ selections on the exper- imental performance. As shwon in Fig. 4, when λ is too small, such as λ = 0.01, the performance degenerates signif- icantly. When λ is above 1, the performance becomes stable with respect to the variance of λ. These results indicate that the proposed method is not sensitive to the selection of λ.

Figure 4: The performance evaluation by varying the parameter β 5 Conclusion and λ on the four datasets, respectively. In this paper, we propose the first dynamic hypergraph struc- ture learning method. The proposed method is able to jointly

3167 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) optimize the hypergraph structure and learn the label projec- Therefore,  1 1  tion matrix. For hypergraph structure updating, the data re- − 2 −1 T − 2 lationships in both the label space and the feature space have tr Dv HWDe H Dv K l l l been taken into consideration, and the hypergraph structure P P P e v v = kpi( wjdj di dphij hpj ). is dynamically optimized, leading to a better data correla- p=1 i=1 j=1 tion modelling. We have applied the proposed method on 3D shape recognition and gesture recognition. Experimental Let us also consider the function − 1 results show that the proposed method can achieve better per- l ! 2 P formance compared with the traditional hypergraph learning ∂ wjhij ∂dv j=1 methods and the state-of-the-art methods. More specifically, i = the proposed method can obtain consistent gains compared ∂hst ∂hst with the traditional hypergraph learning method, indicating − 3  l  2 l the effectiveness of the proposed method on data modelling. 1 X X ∂hij 1 v 3 = −  wjhij  wj = − (di ) δiswt, 2 ∂hst 2 Acknowledgments j=1 j=1 and This work was supported by National Key R&D Program !−1 of China (Grant No. 2017YFC0113000), National Natu- l ∂ P h e ij ral Science Funds of China (U1701262, 61671267), and ∂d j=1 Tsinghua University Initiative Scientific Research Program j = (20171080282). ∂hst ∂hst l −2 P  l  ∂ hij X i=1 2 A Derivations = − h = −de δ .  ij  ∂h j jt The derivation of Eq. (9) is as follows: j=1 st

 1 1   We have − 2 −1 T − 2 dtr I − Dv HWD H Dv K  − 1 − 1  e ∂tr D 2 HWD−1HTD 2 K ∇Q(H) = v e v dH ∂hst  − 1 − 1  2 −1 T 2   dtr Dv HWDe H Dv K l l l X X X ∂ = − = k (w dedvdvh h ) dH pi  ∂h j j i p ij pj  p=1 i=1 j=1 st Let l  ∂de ∂dv  dv ... 0  X j v v e i v  1 = kpiwj (di dphij hpj ) + dj dphij hpj − 1 ∂h ∂h 2  . .. .  p,i,j=1 st st Dv =  . . .  , v ∂dv ∂h ∂h  0 ··· dn+m e v p  e v v ij e v v pj +dj di hij hpj + dj di dp hpj + dj di dphij  e  ∂hst ∂hst ∂hst d1 ... 0 l −1  . . .   v ∂dv  De = . .. . . X ∂di e p v v   . .  = kpiwj d + d d h h e ∂h j ∂h i p ij pj 0 ··· dn+m p,i,j=1 st st

We obtain   e ! ∂hij ∂hpj e v v ∂dj v v − 1 − 1 + hpj + hij dj di dp + (di dphij hpj ) 2 −1 T 2 ∂h ∂h ∂h Dv HWDe H Dv st st st  v v v  d1h11 d1h12 . . . d1h1l l l X X (kps + ksp) 3 =  . . .. .  = − w (dv) w dvdeh h  . . . .  2 j s t p j sj pj v v v p=1 j=1 dl hl1 dl hl2 ··· dl hll  e   v v v  l l w1d1 ... 0 d1h11 d2h21 . . . dl hl1 X 2 X − k w (de) dvdvh h + (k + k )w dedvdvh .  . .. .   . . .. .  ps t t s p st pt ps sp t t s p pt  . . .   . . . .  p,i=1 p=1 0 ··· w de dvh dvh ··· dvh l l l 1l 2 2l l ll After integrating and simplification, we obtain  l l  P e v 2 2 P e v v − 1 − 1 w d (d ) h ··· w d d d h h T 2 2 −2 j j 1 1j j j 1 l 1j lj ∇Q(H) = J(I ⊗ H Dv KDv H)WDe  j=1 j=1    − 3 −1 T − 1  . . .  + D 2 HWD H D 2 KJW =  . .. .  . v e v   − 1 − 1 −1  l l  − 2D 2 KD 2 HWD .  P e v v P e v 2 2  v v e wjdj dnd1hnj h1j ··· wjdj (dl ) hlj j=1 j=1

3168 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

References classification. IEEE Transactions on Image Processing, [An et al., 2017] Le An, Xiaojing Chen, Songfan Yang, and 26(1):452–463, 2017. Xuelong Li. Person re-identification by multi-hypergraph [Luo et al., 2018] Fulin Luo, Bo Du, Liangpei Zhang, Lefei fusion. IEEE Transactions on Neural Networks and Learn- Zhang, and Dacheng Tao. Feature learning using spatial- ing Systems, 28(11):2763–2774, 2017. spectral hypergraph discriminant analysis for hyperspec- [Chen et al., 2003] Ding-Yun Chen, Xiao-Pei Tian, Yu-Te tral image. IEEE Transactions on Cybernetics, 2018. Shen, and Ming Ouhyoung. On visual similarity based 3D [Oreifej and Liu, 2013] Omar Oreifej and Zicheng Liu. model retrieval. Computer Graphics Forum, 22(3):223– Hon4d: Histogram of oriented 4D normals for activity 232, 2003. recognition from depth sequences. In Proceedings of IEEE [Du et al., 2017] Dawei Du, Honggang Qi, Longyin Wen, Conference on Computer Vision and Pattern Recognition, Qi Tian, Qingming Huang, and Siwei Lyu. Geometric hy- pages 716–723, 2013. pergraph learning for visual tracking. IEEE Transactions [Su et al., 2015] Hang Su, Subhransu Maji, Evangelos on Cybernetics, 47(12):4182–4195, 2017. Kalogerakis, and Erik Learned-Miller. Multi-view con- [Gao et al., 2012] Yue Gao, Meng Wang, Dacheng Tao, volutional neural networks for 3D shape recognition. In Rongrong Ji, and Qionghai Dai. 3-D object retrieval and Proceedings of IEEE International Conference on Com- recognition with hypergraph analysis. IEEE Transactions puter Vision, 2015. on Image Processing, 21(9):4290–4303, 2012. [Wang et al., 2012] Jiang Wang, Zicheng Liu, Jan [Gao et al., 2013] Yue Gao, Meng Wang, Zheng-Jun Zha, Chorowski, Zhuoyuan Chen, and Ying Wu. Robust Jialie Shen, Xuelong Li, and Xindong Wu. Visual-textual 3D action recognition with random occupancy patterns. joint relevance learning for tag-based social image search. In Proceedings of European Conference on Computer IEEE Transactions on Image Processing, 22(1):363–376, Vision, pages 872–885, 2012. 2013. [Zhang et al., 2017] Zhihong Zhang, Lu Bai, Yuanheng [Huang et al., 2009] Yuchi Huang, Qingshan Liu, and Dim- Liang, and Edwin Hancock. Joint hypergraph learning and itris Metaxas. Video object segmentation by hypergraph sparse regression for feature selection. Pattern Recogni- cut. In Proceedings of IEEE Conference on Computer Vi- tion, 63:291 – 309, 2017. sion and Pattern Recognition, pages 1738–1745, 2009. [Zhao et al., 2018] Wei Zhao, Shulong Tan, Ziyu Guan, [Huang et al., 2010] Yuchi Huang, Qingshan Liu, Shaoting Boxuan Zhang, Maoguo Gong, Zhengwen Cao, and Quan Zhang, and Dimitris N Metaxas. Image retrieval via prob- Wang. Learning to map social network users by unified abilistic hypergraph ranking. In Proceedings of IEEE manifold alignment on hypergraph. IEEE Transactions on Conference on Computer Vision and Pattern Recognition, Neural Networks and Learning Systems, 2018. pages 3376–3383, 2010. [Zhou and Scholkopf,¨ 2007] Jiayuan Huang Zhou, Denny [Hwang et al., 2008] TaeHyun Hwang, Ze Tian, Rui and Bernhard Scholkopf.¨ Learning with hypergraphs: Kuangy, and Jean-Pierre Kocher. Learning on weighted Clustering, classification, and embedding. In Proceedings hypergraphs to integrate protein interactions and gene ex- of Advances in Neural Information Processing Systems, pressions for cancer outcome prediction. In Proceedings pages 1601–1608. MIT Press, 2007. of IEEE International Conference on Data Mining, pages [Zhou et al., 2003] Denny Zhou, Olivier Bousquet, 293–302, 2008. Thomas N Lal, Jason Weston, and Bernhard Scholkopf.¨ [Jayanti et al., 2006] Subramaniam Jayanti, Yagna- Learning with local and global consistency. In Pro- narayanan Kalyanaraman, Natraj Iyer, and Karthik ceedings of Advances in Neural Information Processing Ramani. Developing an engineering shape benchmark for Systems, pages 321–328, 2003. cad models. Computer-Aided Design, 38(9):939 – 953, [Zhu et al., 2015] Lei Zhu, Jialie Shen, Hai Jin, Ran Zheng, 2006. and Liang Xie. Content-based visual landmark search via [Li and Milenkovic, 2017] Pan Li and Olgica Milenkovic. multimodal hypergraph learning. IEEE Transactions on Inhomogeneous hypergraph clustering with applications. Cybernetics, 45(12):2756–2769, 2015. In Proceedings of Advances in Neural Information Pro- [Zhu et al., 2017a] Lei Zhu, Jialie Shen, Liang Xie, and cessing Systems, pages 2305–2315. Curran Associates, Zhiyong Cheng. Unsupervised topic hypergraph hashing Inc., 2017. for efficient mobile image retrieval. IEEE transactions on [Liu et al., 2017a] Mingxia Liu, Jun Zhang, Pew-Thian Yap, cybernetics, 47(11):3941–3954, 2017. and Dinggang Shen. View-aligned hypergraph learning [Zhu et al., 2017b] Xiaofeng Zhu, Yonghua Zhu, Shichao for Alzheimer’s disease diagnosis with incomplete multi- Zhang, Rongyao Hu, and Wei He. Adaptive hypergraph modality data. Medical Image Analysis, 36:123–134, learning for unsupervised feature selection. In Proceed- 2017. ings of International Joint Conference on Artificial Intelli- [Liu et al., 2017b] Qingshan Liu, Yubao Sun, Cantian Wang, gence, pages 3581–3587, 2017. Tongliang Liu, and Dacheng Tao. Elastic net hyper- graph learning for image clustering and semi-supervised

3169