On Decomposing a Deep Neural Network Into Modules

On Decomposing a Deep Neural Network into Modules Rangeet Pan Hridesh Rajan [email protected] [email protected] Dept. of Computer Science, Iowa State University Dept. of Computer Science, Iowa State University 226 Atanasoff Hall, Ames, IA, USA 226 Atanasoff Hall, Ames, IA, USA ABSTRACT abstractions in the data. The availability of large datasets has made Deep learning is being incorporated in many modern software sys- it feasible to train (adjust the weights of) these multiple layers. Since tems. Deep learning approaches train a deep neural network (DNN) these layers are organized in the form of a network, this machine model using training examples, and then use the DNN model for learning model is also referred to as a deep neural network (DNN). prediction. While the structure of a DNN model as layers is observ- While the jury is still out on the impact of deep learning on overall able, the model is treated in its entirety as a monolithic component. understanding of software’s behavior, a significant uptick in its To change the logic implemented by the model, e.g. to add/remove usage and applications in wide-ranging areas and safety-critical logic that recognizes inputs belonging to a certain class, or to re- systems, e.g., autonomous driving, aviation system, medical anal- place the logic with an alternative, the training examples need to ysis, etc, combine to warrant research on software engineering be changed and the DNN needs to be retrained using the new set of practices in the presence of deep learning. examples. We argue that decomposing a DNN into DNN modules— A software component and a DNN model are similar in spirit— akin to decomposing a monolithic software code into modules—can both encode logic and represent significant investments. The former bring the benefits of modularity to deep learning. In this work, is an investment of the developer’s efforts to encode desired logic in we develop a methodology for decomposing DNNs for multi-class the form of software code, whereas the latter is an investment of the problems into DNN modules. For four canonical problems, namely modeler’s efforts, an effort to label training data, and computation MNIST, EMNIST, FMNIST, and KMNIST, we demonstrate that such time to create a trained DNN model. The similarity ends there, decomposition enables reuse of DNN modules to create different however. While independent development of software components DNNs, enables replacement of one DNN module in a DNN with and a software developer’s ability to (re)use software parts has another without needing to retrain. The DNN models formed by led to the rapid software-driven advances we enjoy today; the composing DNN modules are at least as good as traditional mono- ability to (re)use parts of DNN models has not been, to the best of lithic DNNs in terms of test accuracy for our problems. our knowledge, attempted before. The closest approach, transfer learning [14], attempts to reuse the entire DNN model for another CCS CONCEPTS problem. Could we decompose and reuse parts of a DNN model? To that end, we introduce the novel idea of decomposing a trained • Computing methodologies → Machine learning; • Software DNN model into DNN modules. Once the model has been decom- and its engineering → Abstraction and modularity. posed, the modules of that model might be reused to create a com- KEYWORDS pletely different DNN model, for instance, a DNN model that needs deep neural network, modularity, decomposition logic present in two different existing models can be created by ACM Reference Format: composing DNN modules from those two models, without having Rangeet Pan and Hridesh Rajan. 2020. On Decomposing a Deep Neural to retrain. DNN decomposition also enables replacement. A DNN Network into Modules. In Proceedings of The 28th ACM Joint European module can be replaced by another module without having to re- Software Engineering Conference and Symposium on the Foundations of Soft- train the DNN. The replacement could be needed for performance ware Engineering (ESEC/FSE 2020). ACM, New York, NY, USA, 11 pages. improvement or for replacing a functionality with a different one. https://doi.org/10.1145/1122445.1122456 To introduce our notion of DNN decomposition, we have fo- cused on decomposing DNN models for multi-label classification 1 INTRODUCTION problems. We propose a series of techniques for decomposing a A class of machine learning algorithms known as deep learning has DNN model for n-label classification problem into n DNN modules, received much attention in both academia and industry. These algo- one for each label in the original model. We consider each label as rithms use multiple layers of transformation functions to convert a concern, and view this decomposition as a separation of concerns inputs to outputs, each layer learning successively higher-level of problem [4, 13, 18]. Each DNN module is created due to its ability to hide one concern [13]. As expected, a concern is tangled with Permission to make digital or hard copies of all or part of this work for personal or other concerns [1, 6, 15] and we have proposed initial strategies to classroom use is granted without fee provided that copies are not made or distributed identify and eliminate concern interaction. for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM We have evaluated our DNN decomposition approach using 16 must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, different models for four canonical datasets (MNIST [7], Fashion to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MNIST [21], EMNIST [3], and Kuzushiji MNIST [2]). We have exper- ESEC/FSE 2020, 8 - 13 November, 2020, Sacramento, California, United States imented with six approaches for decomposition, each successively © 2020 Association for Computing Machinery. refining the former. Our evaluation shows that for the majority ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 of the DNN models (9 out of 16), decomposing a DNN model into https://doi.org/10.1145/1122445.1122456 ESEC/FSE 2020, 8 - 13 November, 2020, Sacramento, California, United States Rangeet Pan and Hridesh Rajan modules and then composing the DNN modules together to form a structure of model A to match the structure of model B, which DNN model that is functionally equivalent to the original model, also has the potential to change model A’s effectiveness for 0-4 and but more modular, does not lead to any loss of performance in terms 6-9. Then, they can replace the training samples for 5 used to train of model accuracy. We also examine intra- and inter-dataset reuse, model A with those used to train model B. Finally, the modified where DNN modules are used to solve a different problem using the model A would be trained with the modified training data. Even for same training dataset or entirely different problem using an entirely these simple scenarios, both reuse and replacement are complicated. different dataset. Our results show DNN models trained by reusing Coarse-grained reuse of the entire DNN model as a black-box is DNN modules are at least as good as DNN models trained from becoming common. As modern software continues to integrate scratch for MNIST (+0.30%), FMNIST (+0.00%), EMNIST (+0.62%), DNN models as components, it is important to understand whether and KMNIST (+0.05%), We have also evaluated replacement, where fine-grained reuse/replacement of DNN models can be improved. a DNN module is replaced by another and see similarly encouraging results. In the rest of this paper, we describe our initial steps toward 3 RELATED IDEAS achieving better modularity for deep neural networks starting with We are inspired by the vast body of seminal work on decomposing a motivating example in §2, related work in §3, methodology in §4 software into modules [4, 6, 13, 15, 18]. We believe that there are and results in §5. §6 concludes. ample opportunities to consider a number of these ideas in the 2 WHY DECOMPOSE A DNN INTO MODULES? context of DNN, but focus primarily on decomposition in this work. The decomposition of a DNN into modules has not been at- Achieving decomposition of a DNN into modules has the potential tempted before, but coarse-grained reuse (at the level of the entire to bring many benefits of modular programming and associated DNN) has been. An approach for achieving reusability in deep practices to deep learning. To motivate our objectives, consider Fig- learning is transfer learning [14]. In this technique, a DNN model ure 1 that uses DNN models for recognizing digits [7] and letters [3] is leveraged for a different setting by changing the output layer to illustrate. The top-half of the figure shows two reuse scenarios. and input shape of a pre-trained network. Transfer learning can (1) If we have a DNN model to recognize 0-9, can we extract the be either heterogeneous [16, 17] or homogeneous [12, 20] based logic to recognize 0-1 and build a DNN model for binary digits? (2) on the problem domain. Zhou [22] proposed a specification based If we have two DNN models for recognizing 0-9 and A-J, can we framework that uniquely identifies the goal of the model. These build a DNN model for recognizing 0-9AB, essentially a duodeci- models can be advertised in the marketplace so that other devel- mal classifier? The bottom-half of the figure shows a replacement opers can take this model as input and utilize them to build other scenario.

On Decomposing a Deep Neural Network Into Modules

Networks in Nature: Dynamics, Evolution, and Modularity

Modularity in Static and Dynamic Networks

Multidimensional Network Analysis

A Network Approach to Define Modularity of Components In

Task-Dependent Evolution of Modularity in Neural Networks1

Analyzing Social Media Network for Students in Presidential Election 2019 with Nodexl

Ensemble-Based Community Detection in Multilayer Networks

On Community Structure in Complex Networks: Challenges and Opportunities

Generalized Louvain Method for Community Detection in Large Networks

Comm 645 Handout – Nodexl Basics

Graph and Network Analysis

Social Network Analysis: Homophily