Cogdl: Toolkit for Deep Learning on Graphs

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 CogDL: Toolkit for Deep Learning on Graphs Yukuo Cen, Zhenyu Hou, Yan Wang, Qibin Chen, Yizhen Luo, Xingcheng Yao, Aohan Zeng, Shiguang Guo, Yang Yang, Peng Zhang, Guohao Dai, Yu Wang, Chang Zhou, Hongxia Yang, Jie Tang?, IEEE Fellow Abstract—Deep learning on graphs has attracted tremendous attention from the graph learning community in recent years. It has been widely used in several real-world applications such as social network analysis and recommender systems. In this paper, we introduce CogDL, an extensive toolkit for deep learning on graphs that allows researchers and developers to easily conduct experiments and build applications. It provides standard training and evaluation for the most important tasks in the graph domain, including node classification, graph classification, etc. For each task, it provides implementations of state-of-the-art models. The models in our toolkit are divided into two major parts, graph embedding methods and graph neural networks. Most of the graph embedding methods learn node-level or graph-level representations in an unsupervised way and preserves the graph properties such as structural information, while graph neural networks capture node features and work in semi-supervised or self-supervised settings. All models implemented in our toolkit can be easily reproducible for leaderboard results. Most models in CogDL are developed on top of PyTorch, and users can leverage the advantages of PyTorch to implement their own models. Furthermore, we demonstrate the effectiveness of CogDL for real-world applications in AMiner, a large academic mining system. Index Terms—Graph representation learning, Graph convolutional networks, Graph neural networks, Deep learning. F 1 INTRODUCTION and depth-first sampling (DFS). Another type of network Graph-structured data have been widely utilized in many embedding methods is matrix factorization (MF)-based such real-world scenarios. For example, each user on Facebook as GraRep [9], HOPE [8], NetMF [1], and ProNE [2], which can be seen as a vertex and their relations like friendship construct a proximity matrix and use MF such as singular or followership can be seen as edges in the graph. We value decomposition (SVD) [24] to obtain graph representa- might be interested in predicting the interests of users, or tions. whether a pair of nodes in a network should have an edge Recently, graph neural networks (GNNs) have been connecting them. However, traditional machine learning al- proposed and have achieved impressive performance in gorithms cannot be directly applied to the graph-structured semi-supervised representation learning. Graph Convolu- data. Inspired by recent trends of representation learning tion Networks (GCNs) [18] utilize a convolutional archi- on computer vision and natural language processing, graph tecture via a localized first-order approximation of spectral representation learning [4, 5, 6] is proposed as an efficient graph convolutions. GraphSAGE [25] is a general inductive technique to address this issue. Representation learning on framework that leverages node features to generate node graphs aims to learn low-dimensional continuous vectors embeddings for previously unseen nodes. Graph Atten- for vertices/graphs while preserving intrinsic graph prop- tion Networks (GATs) [16] leverage the multi-head self- erties. attention mechanism and enable (implicitly) specifying dif- One type of network embedding is Skip-gram [22] based ferent weights to different nodes in a neighborhood. model, such as DeepWalk [6], LINE [5], node2vec [4], and There are several toolkits supporting graph represen- PTE [23]. DeepWalk [6] transforms a graph structure into a tation learning algorithms, such as PyTorch Geometric uniformly sampled collection of truncated random walks (PyG) [26] and Deep Graph Library (DGL) [27]. PyG is a arXiv:2103.00959v2 [cs.SI] 23 Jul 2021 and optimizes with the Skip-gram model. LINE [5] pro- library for deep learning on irregularly structured input poses loss functions to preserve both first- and second- data such as graphs, point clouds, and manifolds, built order proximities and concatenates two learned embed- upon PyTorch [28]. DGL provides several APIs allowing dings. node2vec [4] conducts biased random walks to arbitrary message-passing computation over large-scale and smoothly interpolate between breadth-first sampling (BFS) dynamic graphs with efficient memory usage and high training speed. However, these popular graph representa- • Yukuo Cen, Zhenyu Hou, Yan Wang, Qibin Chen, Yizhen Luo, Xingcheng tion learning libraries may not completely integrate various Yao, Aohan Zeng, Shiguang Guo are with the Department of Computer representation learning methods (e.g., Skip-gram or matrix Science and Technology, Tsinghua University, China. E-mail: fcyk20, factorization based network embedding methods). More im- [email protected]. • Yang Yang is with the Department of Computer Science and Technology portantly, these libraries only focus on specific downstream at Zhejiang University, China. tasks in the graph domain (e.g., semi-supervised node • Guohao Dai and Yu Wang are with the Department of Electronic Engi- classification) and do not provide sufficient reproducible neering, Tsinghua University, China. • evaluations of model performance. Chang Zhou and Hongxia Yang are with the Alibaba Group, China. 1 • Jie Tang is with the Department of Computer Science and Technology, Ts- In this paper, we introduce CogDL , an extensive graph inghua University, and Tsinghua National Laboratory for Information Sci- representation learning toolkit that allows researchers and ence and Technology (TNList), China. E-mail: [email protected], corresponding author. 1. The toolkit is available at: https://github.com/thudm/cogdl JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 TABLE 1: Micro-F1 score (%) reproduced by CogDL for unsupervised multi-label node classification, including matrix factorization and skip-gram methods. 50% of nodes are labeled for training in PPI, Blogcatalog, and Wikipedia, 5% in DBLP and Flickr. These datasets correspond to different downstream scenarios: PPI stands for protein-protein interactions; Wikipedia is a co-occurrence network of words; Blogcatelog and Flickr are social networks; DBLP is a citation network. Rank Method PPI (50%) Wikipedia (50%) Blogcatalog (50%) DBLP (5%) Flickr (5%) Reproducible 1 NetMF [1] 23.73 ± 0.22 57.42 ± 0.56 42.47 ± 0.35 56.72 ± 0.14 36.27 ± 0.17 Yes 2 ProNE [2] 24.60 ± 0.39 56.06 ± 0.48 41.16 ± 0.26 56.85 ± 0.28 36.56 ± 0.11 Yes 3 NetSMF [3] 23.88 ± 0.35 53.81 ± 0.58 40.62 ± 0.35 59.76 ± 0.41 35.49 ± 0.07 Yes 4 Node2vec [4] 20.67 ± 0.54 54.59 ± 0.51 40.16 ± 0.29 57.36 ± 0.39 36.13 ± 0.13 Yes 5 LINE [5] 21.82 ± 0.56 52.46 ± 0.26 38.06 ± 0.39 49.78 ± 0.37 31.61 ± 0.09 Yes 6 DeepWalk [6] 20.74 ± 0.40 49.53 ± 0.54 40.48 ± 0.47 57.54 ± 0.32 36.09 ± 0.10 Yes 7 SpectralClustering [7] 22.48 ± 0.30 49.35 ± 0.34 41.41 ± 0.34 43.68 ± 0.58 33.09 ± 0.07 Yes 8 Hope [8] 21.43 ± 0.32 54.04 ± 0.47 33.99 ± 0.35 56.15 ± 0.22 28.97 ± 0.19 Yes 9 GraRep [9] 20.60 ± 0.34 54.37 ± 0.40 33.48 ± 0.30 52.76 ± 0.42 31.83 ± 0.12 Yes TABLE 2: Accuracy (%) reproduced by CogDL for semi- Therefore, we propose CogDL as an open standard toolkit supervised and self-supervised node classification on Ci- for graph benchmarks. The key point of CogDL is to build tation datasets. # and " mean our results are lower or reproducible benchmarks for representation learning on higher than the result in original papers. Repro* is short for graphs. We formalize the standard training and evaluation Reproducible. modules for the most important tasks in the graph domain. The overall framework is described in Figure 1. Our Repro* Rank Method Cora Citeseer Pubmed framework is built on PyTorch [28], which is the most pop- 1 GRAND [10] 84.8 75.1 82.4 Yes ular deep learning library. PyTorch provides an imperative 2 GCNII [11] 85.1 71.3 80.2 Yes 3 MVGRL [12] 83.6 # 73.0 80.1 Partial and Pythonic programming style that supports code as a 4 APPNP [13] 84.3 " 72.0 80.0 Yes model, makes debugging easy. Therefore, our toolkit can 5 Graph-Unet [14] 83.3 # 71.2 # 79.0 Partial leverage the advantages of PyTorch. CogDL provides im- 6 GDC [15] 82.5 72.1 79.8 Yes plementations of several kinds of models based on Python 7 GAT [16] 82.9 71.0 78.9 Yes 8 DropEdge [17] 82.1 72.1 79.7 Yes and PyTorch, including network embedding methods such 9 GCN [18] 81.5 71.4 " 79.5 Yes as Deepwalk, NetMF, ProNE, and GNNs such as GCN, 10 DGI [19] 82.0 71.2 76.5 Yes GAT. It also supports several genres of datasets for node 11 JK-net [20] 81.8 69.5 77.7 Yes 12 Chebyshev [21] 79.0 69.8 68.6 Yes classification and graph classification. All the models and datasets can be utilized for experiments under different task settings in CogDL. Each task provides a standard training and evaluation process for comparison. developers to easily train and compare baseline or cus- tomized models for node classification, graph classification, To demonstrate the design and usage of CogDL, we will and other important tasks in the graph domain. We summa- answer the following three questions, which correspond to rize the contributions of CogDL as follows: the first three contributions summarized in Section 1: • High Efficiency: CogDL utilizes well-optimized sparse • Question 1: How to efficiently train GNN models via kernel operators to speed up training of GNN models.

Load more