Rotation Equivariant Graph Convolutional Network for Spherical Image Classification
Total Page:16
File Type:pdf, Size:1020Kb
Rotation Equivariant Graph Convolutional Network for Spherical Image Classification Qin Yang1, Chenglin Li1, Wenrui Dai1, Junni Zou1, GuoJun Qi2, Hongkai Xiong1 1Shanghai Jiao Tong University, 2Futurewei Technologies fyangqin, lcl1985, daiwenrui, zoujunni, [email protected], [email protected] Abstract popular in virtual reality (VR) and augmented reality (AR) systems for applications ranging from robots [23, 27] to au- Convolutional neural networks (CNNs) designed for tonomous cars [15, 16], which results in an increasing de- low-dimensional regular grids will unfortunately lead to mand for the analysis of spherical images. Convolutional non-optimal solutions for analyzing spherical images, due neural networks (CNNs) have achieved significant improve- to their different geometrical properties from planar im- ment in analysis tasks related to planar images, e.g., image ages. In this paper, we generalize the grid-based CNNs to recognition [10], object detection [8], and image segmenta- a non-Euclidean space by taking into account the geome- tion [9]. However, it is still challenging to generalize CNNs try of spherical surfaces and propose a Spherical Graph to analyzing spherical images defined on the non-Euclidean Convolutional Network (SGCN) to encode rotation equiv- spheres, as distortions may be incurred when spherical im- ariant representations. Specifically, we propose a spherical ages are projected onto a flat Euclidean surface to accom- graph construction criterion showing that a graph needs modate the grid-based architectures in CNNs [3]. to be regular by evenly covering the spherical surfaces in CNNs commonly adapt to the non-Euclidean spherical order to design a rotation equivariant graph convolutional images in two different ways. The first approach projects layer. For the practical case where the perfectly regu- the spherical images into the planar format that can be lar graph does not exist, we design two quantitative mea- processed directly by CNNs. Various projection methods sures to evaluate the degree of irregularity for a spherical have been studied, including the equirectangular projection graph. The Geodesic ICOsahedral Pixelation (GICOPix) (ERP) and the cube map projection [24], which lead to the is adopted to construct spherical graphs with the minimum inevitable projection distortions. For ERP, filter kernels are degree of irregularity compared to the current popular pixe- further designed for CNNs to compensate for the projec- lation schemes. In addition, we design a hierarchical pool- tion distortion [5, 26, 32]. [26] proposed to learn different ing layer to keep the rotation-equivariance, followed by a kernels with variable size for each row in the projected im- transition layer to enforce the invariance to the rotations ages, however, the model size increases dramatically with for spherical image classification. We evaluate the pro- the growth of image resolution. In [5, 32], the sampling posed graph convolutional layers with different pixelations location of filter kernel is changed to adapt to the distor- schemes in terms of equivariance errors. We also assess the tion level. Without the guidance of rotation-equivariance, effectiveness of the proposed SGCN1 in fulfilling rotation- although model parameters could be reduced by sharing the invariance by the invariance error of the transition layers kernels across all pixels, the model performance declines and recognizing the spherical images and 3D objects. inevitably. The other approaches [3, 7] extend CNNs to non- Euclidean domains to avoid the projection distortions. Al- 1. Introduction though CNNs have strong capability to exploit the local translation equivariance and some works seek to capture Omnidirectional cameras generate spherical images with various transformation equivariant representations of regu- 360-degree view of the scenes that enable an immersive lar 2D images [20, 21, 30], they do not adapt to the 3D rota- experience for users by freely adjusting their viewing ori- tion of spherical images properly. Therefore, it is important entations. Recently, omnidirectional cameras have become to explore rotation-equivariance in spherical image analy- sis. [3] and [7] develop spherical CNNs by introducing the 1Code is aviable at https://github.com/QinYang12/SGCN. This work was supported in part by the NSFC under Grants 61931023, rotation-equivariant spherical cross-correlation in the spec- 61871267, 61972256, 61720106001, 61831018, and 91838303. tral domain. However, Fourier transform is required for the spherical correlation in each step, leading to high computa- also assess the invariance errors of the proposed transition tional cost and significant memory overhead. [17] proposes layers for the ability of capturing rotation-invariance and a graph convolutional neural network for cosmological data recognizing the spherical images. We further employ the that often come as spherical maps tailored by the Hierarchi- proposed SGCN in spherical image classification, demon- cal Equal Area isoLatitude Pixelation (HEALPix). How- strating that SGCN outperforms the state-of-the-art models ever, the irregular feature map in the HEALPix scheme sitll on the Spherical MNIST (S-MNIST), Spherical CIFAR-10 does not maintain the rotation-equivariance. (S-CIFAR-10) and achieve comparable performance to the As an almost uniform discretization of the sphere, icosa- 3D models on the ModelNet40 datasets in terms of rotation hedron [1] has been adopted to represent the spherical do- invariance classification accuracy. main [4, 11, 14, 29]. In [4, 29], the spherical signal is projected on the icosahedron mesh with 20 basic planar re- 2. Preliminaries gions, which is further analyzed with the gauge-equivariant We represent a spherical image as an undirected and con- and orientation-aware CNNs. The distortion is however still nected graph G = (V; E;A), where V is a set of jVj = N large, and discontinuities between the basic planar regions vertices, E is a set of edges, and A is a weighted adja- need to be handled by carefully designed schemes such as cency matrix with each element a = w(v ; v ) represent- the gauge transformation on the features [4] and padding ij i j ing the connection weight between two vertices v and v . [29]. Based on geodesic icosahedron with smaller distor- i j The weight a is zero if vertices v and v are not con- tion, [14] designs the convolution and pooling kernels of ij i j nected. The normalized graph Laplacian is then defined as CNNs and [11] presents parameterized differential opera- L = I − D−1=2AD−1=2, where D 2 RN×N is a diagonal tors on the unstructured grids. Although the convolution N degree matrix with D = P a , and I is the identity kernel is flipped by 180 degrees when applied to the next ii j=1 ij matrix. adjacent triangle [14], the convolution kernels in [11, 14] By recursively computing a Chebyshev polynomial to are still anisotropy and thus not rotation equivariant. approximate the convolution kernel [6], the spectral con- In this paper, we propose a Spherical Graph Convolu- volution with a spherical signal x can be written as tional Network (SGCN) to encode rotation-equivariance for spherical image analysis. Specially, we develop a graph K−1 X ~ convolutional layer through exploring the isometric trans- y = θkTk(L)x; (1) formation equivariance of the graph Chebyshev polynomial k=0 filters which is isotropy, a hierarchical pooling layer to ex- ~ ploit the multi-scale resolutions of the spherical images and where L = 2L/λmax − I, λmax is the largest eigenvalue keep the rotation-equivariance, and a transition layer to cal- of L, and θk denotes the Chebyshev polynomial coefficient culate the rotation-invariant statistics across multiple fea- which is a learnable parameter. Consequently, the Cheby- ~ N×N ture maps of the hierarchical pooling layer. shev polynomial Tk(L) 2 R can be recursively com- puted through T (L~) = 2LT~ (L~)−T (L~) with T = I To enforce rotation-equivariance in the proposed poly- k k−1 k−2 0 and T = L~. The spectral convolution with a K-th order nomial graph convolutional layer, we propose a spherical 1 polynomial is K-localized, i.e., the response of a vertex to graph construction criterion based on regularity, and show the polynomial filter only depends on all the vertex values that given the number of vertices, a regular graph (i.e., ver- and edge weights on a path of length k < K. tices distribute uniformly on the surface of the spherical im- It has been shown that a polynomial filter is equivariant age) is equivariant to more rotations than an irregular one. to graph isometric transformations [13]. In the following, For the practical case where the perfectly regular graph does we give the definition of the graph isometric transformation not exist, we design two quantitative measures to evaluate and the graph isometric transformation equivariance. the degree of irregularity for a spherical graph, and empir- Definition 1. Graph isometric transformation [13]. A ically show that a graph construction scheme with a lower graph isometric transformation g is a bijective mapping g : degree of irregularity will result in smaller equivariance er- V!V that preserves the distance between two adjacent rors of the graph convolutional layers. Further the Geodesic vertices on the graph. The corresponding transformation ICOsahedral Pixelation (GICOPix) scheme is adopted to operator L makes a permutation of signal x by preserving construct the spherical graph, which empirically demon- g their neighbourhoods. It can be formally depicted as strates to achieve a lower degree of irregularity with the least weight variance for edges and least degree variance 8vk 2 V; 9!vj 2 V :[Lgx](vk) = x(vj); (2) for vertices. To demonstrate the effectiveness of the proposed crite- where Lgx is the transformed signal of x, and 9! indicates rion, we evaluate the equivariance errors of the graph convo- that there exists and only exists a vertex vj corresponding lutional layers by different graph construction schemes. We to vk. Graph Construction HPool GConv HPool Tran FC ()=0.15 ()=0.69 GConv σ… σ ()=0.74 ()=3.31 … … … … … Spherical … Image ∗ …… =min((), ()) ∈ Output Figure 1.