Neural Group Testing to Accelerate Deep Learning

Neural Group Testing to Accelerate Deep Learning Weixin Liang and James Zou Stanford University, Stanford, CA 94305 Email: {wxliang, jamesz}@stanford.edu Abstract—Recent advances in deep learning have made the use GROUP TESTING of large, deep neural networks with tens of millions of param- A mathematical strategy for discovering positive items in a large population. The core idea is to test multiple samples at once to save time and money. eters. The sheer size of these networks imposes a challenging computational burden during inference. Existing work focuses Algorithm 1: Two-Round Testing Algorithm 2: Multi-Round Testing Samples are mixed together in equal-sized This algorithm adds extra rounds of primarily on accelerating each forward pass of a neural network. groups and tested. If a group tests positive, group testing to algorithm 1, reducing Inspired by the group testing strategy for efficient disease testing, every sample is retested individually. the total number of tests needed. we propose neural group testing, which accelerates by testing a Round 1: 3 tests Round 1: 3 tests group of samples in one forward pass. Groups of samples that test negative are ruled out. If a group tests positive, samples in that group are then retested adaptively. A key challenge of neural group testing is to modify a deep neural network so that it could Negative Positive Negative Positive test multiple samples in one forward pass. We propose three de- Round 2: 9 tests Round 2: 3 tests signs to achieve this without introducing any new parameters and evaluate their performances. We applied neural group testing in Positive an image moderation task to detect rare but inappropriate images. Positive Round 3: 3 tests We found that neural group testing can group up to 16 images in one forward pass and reduce the overall computation cost by over Positive 73% while improving detection performance. Our code is avail- Fig. 1: Overview of group testing, a strategy that is used in able at https://github.com/Weixin-Liang/NeuralGroupTesting/ efficient disease testing [12]. The core idea is to test multiple I. INTRODUCTION samples at once to save time and money. Recent advances in deep learning have been achieved by increasingly large and computationally-intensive deep neural networks. For example, a ResNeXt-101 network needs 16:51 design of deep neural networks to trade-off between accuracy billion multiply-accumulate operations to process a single im- and model complexity [11]. To sum up, most existing methods age [1]. Due to the challenging computation burden, deploying reduce the time for running each forward pass of the neural deep neural networks is expensive, and the cost scales linearly network. In this paper, we explore an orthogonal direction with the usage of the application. In addition, the inference that accelerates the neural network by reducing the number of cost is prohibitively expensive for privacy-sensitive applications forward passes needed to test a given amount of data. that apply deep learning to encrypted data. The current state- We propose neural group testing, a general deep learning of-the-art uses homomorphic encryption which makes each acceleration framework that saves computation by testing linear algebra step of the deep neural network very expensive multiple samples in one forward pass. Neural group testing to compute [2]. Inferring one image costs hundreds of seconds combines deep learning and group testing, a strategy that is of GPU time. Recent studies also raise concern about the widely used for efficient disease testing [12]. As shown in arXiv:2011.10704v2 [cs.LG] 9 May 2021 excessive energy consumption and CO2 emission caused by Figure 1, the core idea of group testing is to pool samples running deep neural networks [3], [4]. from many people and test on the mixed sample. Groups of In this paper, we focus on the scenario where the class samples that test negative are ruled out, which saves testing distribution is imbalanced. This scenario contains lots of real- many people individually. If a group tests positive, samples world applications including image moderation (detecting rare in that group are then retested adaptively. Similar to group but inappropriate images) [5], malware detection [6], [7] and testing, our framework reduces the number of tests (forward suicide prevention [8]. A key characteristic of these applications passes) of a neural network by allowing the network to test is that there is only a small portion of positives, and it is crucial multiple samples in one forward pass, instead of testing each to detect these positives. sample individually. Existing work on accelerating deep learning inference A key challenge of neural group testing is to modify a deep focuses primarily on accelerating each forward pass of the neural network so that it could test multiple samples in one neural network. Specialized hardware designs reduce the forward pass. We achieve this without introducing any new time for each computation and memory access step of deep parameters to the neural network. Our inspiration comes from neural networks [9]. Parameter pruning reduces redundant Mixup [13], a data augmentation method that averages pairs parameters of the neural network that are not sensitive to of images in the pixel space as augmented training data. We the performance [10]. Neural architecture search automates the propose three designs to address the challenge (See Figure 2), Baseline: Individual Testing Our neural group testing settings are closely related to the Test each sample individually by running a Sample x1 f(1) f(2) f(3) f(4) f(5) full forward pass. We consider a network sparsity-constrained group testing [15], where the group size Positive with 5 layers: f(1), f(2), f(3), f(4), f(5). M could not be arbitrarily large. In general, groups of samples Design 1: Pixel Merge Sample x1 Merged that test negative are ruled out. If a group tests positive, samples Samples are merged ×0.5 Sample f(1) f(2) f(3) f(4) f(5) together in the pixel in that group are then retested adaptively. space, similar to Mixup Sample x2 Positive data augmentation. ×0.5 Pixel-wise Average A. Neural Group Testing Network Design Sample Design 2: Feature Merge Aggregate x (1) (2) We consider binary classification tasks with imbalanced Run each sample through 1 f f (e.g., maxpool) the first few layers to get f(3)f(4)f(5) class distribution. Suppose that we are given an individual an intermediate feature, Sample intermediate (1) (2) Positive testing network ' : X ! Y that could only test each sample and then aggregate the x2 f f feature features into one feature. aggregated feature xm 2 X individually, and output a binary label ym 2 Y (either Design 3: Tree Merge positive or negative). Our goal is to modify ' into a group Merge samples hierarchically and recursively at different levels of the network, leading to Φ Φ M a tree-structured merging scheme. The following shows a binary tree merging scheme. testing network . The input of is a set of samples Sample x f(1) X = fx ; : : : ; x g; x 2 X. In general, each group of M 1 Aggregate Aggregate 1 M m (2) (e.g., maxpool) samples are sampled uniformly from the entire pool of N data- intermediate f Sample x2 f(1) feature points. The output of Φ should be positive as long as one or f(3) f(4)f(5) aggregated intermediate more of the set samples are positive. Specifically, suppose that Positive (1) feature feature Sample x3 f aggregated y is the ground-truth label for testing x individually, then f(2) m m feature intermediate the ground-truth label for testing X is y = max1≤m≤M ym. In Sample x (1) 4 f feature aggregated intermediate addition, we assume a training set is available for fine-tuning feature feature the group testing network Φ. Fig. 2: Three neural network designs of neural group testing 1) Design 1: Merging Samples in Pixels: We first present to test multiple samples with one forward pass. a simple design that does not require any modification to the individual testing network, i.e. Φ = ', inspired by Mixup data augmentation method for image classification tasks [13]. Given starting from the most simple solution that merges samples by an original training set, Mixup generates a new training set by averaging their pixels and then tests on the merged sample. The averaging M random images in the pixel space, as well as their proposed approach requires only several epochs of model fine- labels. For example, if M = 2 and the two random images x1, tuning. We applied neural group testing in an image moderation x2 are a cat image and a dog image respectively, then the new task to detect rare but inappropriate images. We found that image will be (x1 + x2)=2 and the corresponding label will be neural group testing can group up to 16 images in one forward “0:5 cat and 0:5 dog”. Mixup suggests that training on the new pass and reduce the overall computation cost by over 73% training set with mixed samples and mixed labels serves the while improving detection performance. Because neural group purpose of data augmentation, and achieves better test accuracy testing can be performed without modifying the neural network on normal images. Inspired by Mixup, Design 1 tests a group of architecture, it is complementary to and can be combined with M samples X = fx1; : : : ; xM g; xm 2 X by simply averaging 1 P all the popular approaches to speed up inference, such as them in the pixel space into a mixed sample M x2X xm sparsifying or quantizing networks or downsampling images.

Load more