THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE

DEPARTMENT OF ENGINEERING

TENSORSHOW: A GRAPHICAL USER INTERFACE FOR ASSEMBLING, TRAINING AND EVALUATING CONVOLUTIONAL NEURAL NETWORKS

ANDREW WARNER SPRING 2020

A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Computer Science with honors in Electrical Engineering

Reviewed and approved* by the following:

Vishal Monga Professor of Electrical Engineering Thesis Supervisor

Vijaykrishnan Narayanan Distinguished Professor Computer Engineering Honors Adviser

* Electronic approvals are on file. i

ABSTRACT Machine learning applications have developed as quickly as the technologies that drive them in recent years. The need for developing quick machine learning architectures for solving ubiquitous problem has not been met by current programming libraries that facilitate the computations required for training and deploying models. TensorShow utilizes the powerful machine learning library and toolset, TensorFlow, to provide users with a graphical interface for developing and analyzing machine learning architectures. TensorShow specifically builds

Convolutional Neural Network (CNN) architectures by abstracting the construction of convolutional, pooling, and fully connected layers in a clean Graphical User Interface (GUI).

Users can quickly build CNNs for classification tasks on numerous datasets, as well as assemble models that can be ported into Keras and TensorFlow for further development. The task of solving data-driven problems with Machine Learning is made easier with TensorShow’s easy to use user-interface. ii

TABLE OF CONTENTS

LIST OF FIGURES ...... iii

LIST OF TABLES ...... iv

ACKNOWLEDGEMENTS ...... v

Chapter 1 Project Overview ...... 1

Chapter 2 Background and Overview of Technology ...... 3

Convolutional Neural Networks ...... 3 TensorFlow and Python...... 5 React and JavaScript ...... 7 Socket.IO and Transmission Control Protocol ...... 8

Chapter 3 Product Usage ...... 9

Building a Network ...... 9 Setting and Updating Parameters ...... 11 Training a Network ...... 14

Chapter 4 Conclusion ...... 17

BIBLIOGRAPHY ...... 18

iii

LIST OF FIGURES

Figure 1. Starting a CNN in TensorShow ...... 9

Figure 2. Branching Layers in TensorShow...... 10

Figure 3. Completing a Model in TensorShow ...... 11

Figure 4. Setting Convolution Parameters in TensorShow ...... 12

Figure 5. Setting Pooling Parameters in TensorShow ...... 13

Figure 6. Setting Fully Connected Parameters in TensorShow ...... 14

Figure 7. Choosing a Dataset in TensorShow ...... 15

Figure 8. Setting Training Parameters in TensorShow ...... 16

iv

LIST OF TABLES

Table 1. Example CNN Architecture ...... 4

v

ACKNOWLEDGEMENTS

Thank you to the friends I made during my years at school that helped foster my education and maintain my academic rigor. Thank you to my family for their continued support and to my girlfriend for enduring the completion of this project and the declaration of this thesis.

Thank you to the open source community for providing me with the tools to make this project possible. Thank you to Professor Vishal Monga and Professor Vijay Narayanan for their guidance in building this application and writing this dissertation. 1

Chapter 1

Project Overview

TensorShow provides a graphical user interface (GUI) for designing, training and evaluating Convolutional Neural Networks (CNNs). CNNs are a useful network design for machine learning tasks such as image classification. CNNs are able to process images by feeding them forward through a deep architecture of computationally intensive layers. Such layers include convolutional layers, pooling layers, and fully connected layers. Convolutional layers filter a kernel over an image in order to extract features by computing the inner product between the kernel and the window of the image. The weight values in the kernels are learned as the network is fed more images. Pooling layers filter a window over an image and return a single value depending on the type of pooling operation specified. Such pooling operations include maximum pooling, where the maximum value in the window of the image is returned, or average pooling, where the average value in the window of the image is returned. Fully connected layers flatten the image into a vector and pass it forward through a Deep Neural Network (DNN).

Classification is achieved after passing the output of the last fully connected layer into the

SoftMax function where a probability distribution over the class labels is returned. Users are able to easily construct multiple CNNs by branching the different compatible CNN layers. This allows users to try different combinations of hyperparameters (kernel window size, pooling window size, fully connected output units, etc.) and different permutations of CNN layer types.

Moreover, users can deploy models to a TensorFlow backend for quick training and receive useful training insights such as loss values and accuracy values of the network on training sets. 2 Users can choose from four different datasets to train and compare the different CNN networks constructed: CIFAR10, CIFAR100, MNIST Digits, and MNIST Fashion. Each dataset provides tens of thousands of training and test images that are easily fed through CNNs. Users can test their accuracy and loss scores against benchmarks available for these commonly used datasets.

Information about each of the datasets is provided to the user so they can understand the different shapes of the images and know how large the training and testing sets are. TensorShow was built using the React framework for the JavaScript frontend and TensorFlow for the machine learning infrastructure in the backend. The two components are connected using socket level TCP transfer for live inference of training models. Altogether, the components are connected to deliver a full stack application.

3 Chapter 2

Background and Overview of Technology

TensorShow harnesses cutting edge mathematical and technological advances in machine learning and application development to provide a seamless user experience. Convolutional

Neural Networks can be difficult to visualize during construction. TensorShow allows users to understand the flow of their networks as they are constructed. Moreover, in order to train the networks efficiently, TensorFlow is used in the backend for the most computationally expensive operations. The application is developed using the React JavaScript framework because it provides easy to use mechanisms for maintaining the state of the application. Moreover, React allows real time Document Object Model (DOM) manipulation and rendering which reduces overhead. The user-interface is designed using Google’s Material-UI Kit because of the simplicity and functionality of the components. Lastly, Socket.IO is used to provide an interface to the TCP level transmission of data from the TensorFlow backend to the React frontend.

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are the most common architecture used in image classification, image regression and object recognition tasks. CNNs are useful for extracting features from images and reducing unnecessary information through pooling, making them computational efficient for real-time deployment. CNNs are not different from regular neural networks such as Deep Neural Networks (DNNs) because they still have learned weights and biases. The weights that are learned in the CNNs are the values in the kernels that are filtered over the image and the parameters in fully connected layers at the end of the network that are 4 used for classification. However, CNNs are more efficient than normal DNNs for image classification because DNNs do not scale well to image classification. For example, if an image that is 32 x 32 x 3 is used for classification, the first layer of the network would have to have

3072 weights. Obviously, the number of parameters necessary for images by using a DNN for classification increases quadratically for most 3-channel images. However, with a CNN, the weights learned are restricted to the kernel size. A 5 x 5 x 16 kernel would only need 400 parameters.

The way in which CNNs process images is by passing images through activation volumes that perform geometric transformations and reduce unnecessary information. An image that has the shape 32 x 32 x 3 coming from a dataset with 10 classes will be transformed into a vector of size 1 x 1 x 10. The convolutional and pooling layers perform the reduction of the image while the fully connected layers process the image vector and outputs the correct number of class labels. The transformations of the image through convolutional neural network can be described in Table 1.

Table 1. Example CNN Architecture

Layer Type Image Size (Output) Number of Parameters

Input 32 x 32 x 3 0

Convolutional 32 x 32 x 12 (5 x 5 x 12) = 300

Pooling 16 x 16 x 12 0

Convolutional 16 x 16 x 4 (4 x 4 x 4) = 64

Pooling 8 x 8 x 4 0

Fully Connected 1 x 1 x 16 (256 x 16) = 4096

Fully Connected 1 x 1 x 10 (16 x 10) = 160 5

The CNN described in Table 1 exemplifies how the transformations of the image through kernel filtering and pooling can reduce the number of trainable parameters in the network. Most machine learning technologies provide a framework for assembling CNNs.

TensorFlow and Python

Training, evaluating and deploying machine learning models requires a significant amount of processing power. TensorFlow is an open-source machine learning library written in

C++ and interfaced through Python. TensorFlow is developed and maintained by Google Inc.

The Python software development kit (SDK) provides methods for building and setting up training schedules for machine learning models. The C++ backend optimizes the training schedule and provides the ability for true multithreading as well as distributional computation on

GPUs.

TensorShow utilizes the Keras library internal to the TensorFlow framework. Keras abstracts away low-level matrix operations into prepackaged layer classes. There are different layer classes for almost any operation necessary in typical machine learning models. The layer classes utilized by TensorShow are Input, Conv2D, AveragePooling2D, MaxPool2D, Flatten, and Dense. The Input Layer is used to specify the shape of the input images. The most useful property of this layer is that the batch size for training can be abstracted. Without Keras, the batch size must be specified for some layers in order to set up the correct matrix sizes. However, since batch size is an important parameter for users to set when initializing a training session, it is important that it can be withheld until the model is compiled for training. Conv2D, 6 AveragePooling2D and MaxPool2D are fed the parameters that the user sets in the front end for each type of layer. For example, if the user specifies a convolutional layer to be added to the network, the window shape and stride parameters are directly fed into the class constructor for a

Conv2D layer. Flatten is a useful layer for taking an image, with width, height and channel dimensions, and transforming it into a vector that can be passed through a DNN. The dense layers are used for assembling the fully connected layers in the CNN architecture. These layers are very simple to construct, because the only important parameter is the number of output units.

Activations and regularizations can be added to each type of layer as a parameter during construction. Each constructor returns a functional layer that can be applied to an input matrix and provides an output matrix. This way, operations are entirely sequential and operation shapes are inferred from the input shape specified in the initial input layer. After assembling a model, it is simple to compile the model with the metrics that are desired to be measured, such as loss and accuracy, and callbacks to be called during training. TensorShow uses custom callback functions that transmit data over the socket TCP to the frontend at the end of each training batch and training epoch. This way, the front end is able to monitor training progress live as the model is fed more batches at increasing epochs. Moreover, after compiling models, the model objects can be stored and serialized, even before training, the ability to store pre-trained models allows users to create skeleton CNN architectures and evaluate how the shapes of input changes from layer to layer. 7 React and JavaScript

Allowing users to construct networks from scratch requires a powerful user-interface rendering system. JavaScript provides a way to programmatically display and render components in a user-interface. React is a JavaScript library developed and maintained by Facebook. The library efficiently renders and updates individual components according to how the data model changes. Each component can maintain its own internal state, using React Hooks, or communicate across a shared data store called a React Context. TensorShow leverages both of these technologies to update the user-interface according to how the user manipulates components. For example, if a user adds a new layer to an existing CNN architecture, multiple

React Contexts are updated. One React Context maintains the coordinates of the components and how they are displayed in a tree. Another Context maintains the information that is associated with each of the components. By allowing the memory stores to be shared and manipulated across individual components, the user-interface is systematically updated according to any changes in data in any data store. It is important for some information to be shared across components and for some information to be internal to individual components. For example, when a user is setting the parameters for a new convolutional layer, the individual textboxes maintain their own state as the user updates the values in the entries. However, when the user submits the parameters as a new layer, these individual memory units (text fields, checkboxes, select menus) are passed to the shared context so the application knows how to update holistically.

Building and deploying components is simple in React because the library provides a way to create functional components. Instead of using the familiar class-object paradigm typically utilized in JavaScript applications, React components are entirely functional. The state 8 of the function is maintained by React Hooks and the lifecycle of the components is interfaced through React Effects. React Hooks are a way to internalize memory to individual components, as discussed earlier. React Effects generalize component lifecycles to update components either on initialization or whenever certain data stores change. For example, a new component can be programmatically instructed to send API requests to a static server whenever the value of a select menu changes. In TensorShow, the React Effects are used to send socket level transmissions of data over TCP channels.

Socket.IO and Transmission Control Protocol

In order to transmit information from the front-end React application to the backend

TensorFlow framework, the Socket.IO library and framework is used for setting up easy TCP channels. Socket.IO is a JavaScript library that enables real-time, bidirectional and event-based communication. The server is build using , a library for developing servable code in

Python. Flask has a socket level implementation, that instantiates channels for accepting connections, emitting data through communication events, and listening for events from the clients. There can be multiple connections to one Flask server, allowing for extensible application development. TensorShow utilizes Flask to transmit events in the Keras callbacks to the client. The client listens for events from the Keras callback and also transmits model information from the frontend to the server for model construction. The parameters for layers designed by the user is passed to the server through channels via serialized JSON data structures. 9

Chapter 3

Product Usage

Building a Network

To build a CNN architecture from scratch, the users start from a single input node. The input node is the beginning of all networks the user creates. The input node is where the user is able to specify the correct dataset on which the tree of CNN networks should be trained on.

Users cannot delete the Input Layer, and they do not need to create it either. The node is rendered on application startup. Both the data model that maintains component coordinates and component information is initialized with the input layer and the associated data.

Figure 1. Starting a CNN in TensorShow Users can extend the architecture immediately by adding new layers after the input layer.

In each row, a new layer corresponds to a new model. For example, in Figure 2, the user creates 10 three new layers after the input layer. Each layer in this row represents a new model with an individual layer component. Users can continue to branch from each of these subsequent layers and begin to build unique CNN architectures. Each layer component displays the necessary information that the user supplied when setting parameters. For example, the components in

Figure 2 are all convolutional layers. The window size, window stride, activation function and regularization types are displayed so the user can refer to the parameters set when initializing the components.

Figure 2. Branching Layers in TensorShow Each model can be determined by tracing the components at the leaf of the tree back to the input layer at the top of the tree. Models are initialized when the final fully connected layer is appended to an existing network. Thus, users could technically design CNN architectures with only one layer by passing the output of the input layer into a fully connected layer and specifying this layer as the last layer in the CNN model. A model with this type of architecture would simply flatten the image and pass the vector through a fully connected dense layer and output the number of logits according to the number of labels associated with the chosen dataset. 11

Figure 3. Completing a Model in TensorShow After a model is complete, it is given a final Model component node that displays the depth of the network. For example, in Figure 3 there are seven models currently under construction and two models that are complete. The model on the far left has a depth of three, corresponding to the number of components above it, excluding the input layer. The other complete model also has a depth of three, and has been constructed after branching once from its corresponding first layer.

Setting and Updating Parameters

While constructing the skeleton of the CNN, users have the ability to set the parameters of each of the layer types. As the CNN grows, users are able to build new layers and select layer types that are compatible with the previous layer types. For example, if the previous layer type is either convolutional or pooling, the user can select the next layer to be any of convolutional or pooling. However, if the previous layer is fully connected, the only type of layer that is 12 compatible is another fully connected layer. These selection settings are set dynamically for each component.

Figure 4. Setting Convolution Parameters in TensorShow The parameters that can be set for a convolutional layer are the window shape, the stride sizes, and the activation and regularization types. The window shape and stride sizes are integers because they must be discrete values, however, they are not limited to selectable values. The lower limit of the sizes is 1 and the maximum of the sizes depend on the shape of the dataset.

The activation and regularization parameters are only selectable from a discrete set of values.

The possible activation types are None (linear), Relu, Sigmoid and Tanh. The regularization types are L1 (lasso) and L2 (ridge).

13

Figure 5. Setting Pooling Parameters in TensorShow Users can construct both pooling layers and fully connected layers similarly to how the convolution layers are constructed. For building pooling layers, the parameters that can be set are the window shape and the stride size. However, unlike the convolution layer, there is no channel depth to the window for pooling. This is because pooling is an operation that acts over a window and is a discrete mapping function for all channels of the image. Thus, there are no learned kernels. Users can choose from two different pooling types, Average Pooling and Maximum

Pooling. Both of these operations are supported by the layers package in Keras. Construction of a pooling layer can be seen in Figure 5.

Building a fully connected layer is simple because the only numerical parameter that needs to be set by the user is the number of output units from the layer. Users have the option to complete the construction of their models when they are building the fully connected layer by checking the box labelled “Last Layer.” If this option is selected by the user, the number of output units is defaulted to the number of class labels in the chosen dataset. Users cannot build a final layer until a dataset is chosen in the input layer. Figure 6 demonstrates the construction of a 14 fully connected layer that is also the last layer in the model that will be trained on a dataset with ten output units.

Figure 6. Setting Fully Connected Parameters in TensorShow

Training a Network

After building a multitude of CNN architectures, users can begin training their models.

Before finishing a model, a dataset must be selected in the input layer. The datasets the users have to choose from are CIFAR10 with 50,000 images and 10 class labels, CIFAR100 with

50,000 images and 100 class labels, MNIST digits with 60,000 greyscale images of digits and 10 class labels, and MNIST Fashion with 60,000 greyscale images of clothing items and 10 class labels. Figure 7 demonstrates how users can select a single dataset on which models will be trained. 15

Figure 7. Choosing a Dataset in TensorShow After choosing a dataset, users can train individually completed models by viewing the models and setting the necessary training parameters. The training parameters are the batch size for the input layer, the number of epochs over which the model should be fed the datasets, and the type of optimization algorithm to use. The most commonly used optimization algorithm users have access to is the Stochastic Gradient Descent algorithm (SGD) that is supported by the

TensorFlow backend. TensorFlow also supports Adagrad, Adam and RMSProp for performing backpropagation, the operation that updates variables in the model based on the partial derivative loss function with respect to each variable. Figure 8 displays the window where users can deploy models to be trained. 16

Figure 8. Setting Training Parameters in TensorShow

17 Chapter 4

Conclusion

TensorShow provides researchers, engineers, and educators with a tool that abstracts the technicalities of building Convolutional Neural Networks into an easy to use Graphical User

Interface. Not only does TensorShow make it simple to build a network of CNN architectures, but it also handles the deployment of models for training and provides results for analysis. Users have the ability to build a multitude of architectures side-by-side and compare the parameterization and ordering of layers. Moreover, they can easily download model architectures in portable format that allows them to be used in other applications. TensorShow makes the construction and analysis of convolutional neural networks simple and useful.

18 BIBLIOGRAPHY

Martín Abadi, Ashish Agarwal, et al.TensorFlow: Large-scale machine learning on heterogeneous

systems, 2015. Software available from tensorflow.org.

19

ANDREW WARNER

github.com/warneracw21

EDUCATION The Pennsylvania State University | Schreyer Honors College University Park, PA College of Engineering, Bachelor of Science in Computer Engineering 2016 – 2020 Eberly College of Science, Masters of Applied Statistics 2018 – 2020

SKILLS AND ABILITIES Technical Skills • Programming Languages: Python (2.*, 3.*), SQL, JavaScript, HTML/CSS, R, MATLAB, C, C++ • Front End Development: JavaScript, React, Material Kit and Material UI, ChartJS and D3.js • Distributed Computation: Kubernetes and Docker for Containerization, Apache Spark and TensorFlow • Computing: Google Cloud for High Performance Computing and AWS for High Capacity Data Storage • Machine Learning Infrastructure Engineering: TensorFlow, TensorFlow Probability, Theano, CUDA and OpenCV

WORK EXPERIENCE Bose Corporation Framingham, MA Machine Learning Infrastructure Engineer Summer 2018 • Designed a machine learning model and training schedule for removing unwanted environmental noise from speech samples and enhancing the overall quality of human speech for usage in the Bose HearPhones conversation enhancing headphones • Reduced overhead network training costs while augmenting overall efficiency through robustly distributed TensorFlow, organized delegation of tasks across multiple GPU clusters and utilization of Google Cloud storage and training • Implemented a Generative Adversarial Network (GAN) to gauge the effectiveness of a denoising model using a deep residual Recurrent Neural Network (RNN) against a Convolutional Neural Network (CNN) that detects noise in speech spectrograms BlackRock, Inc. New York, NY Summer Data Analyst Summer 2019 • Navigated archives of financial transaction and customer interaction records to develop a semi-supervised approach to segmenting independent financial advisors and large institutional clients based on purchasing trends and CRM behavior analysis • Aided my small data science team in the development of standardized libraries for both importing and processing data from internal databases and distributing computations across Kubernetes clusters with an Apache Spark framework • Participated in the BlackRock Summer Hackathon where I worked with other interns to develop an application for monitoring the health of core BlackRock Aladdin systems through accessing requiring various internal networks Skillet.ai. State College, PA Lead Software Developer and Statistics Consultant Fall 2018 - Current • Build full stack applications and tools that allow clients to visually and physically leverage the power of various data sources • Consult companies on how to navigate different technologies while offering solutions in database design, application development, and cloud infrastructure tasks

RESEARCH Artificial Intelligence Labs Professor Rebecca J. Passonneau University Park, PA Developer and Research Assistant Spring 2017 – Spring 2018 • Assembled a patent pending algorithm, EDUA, for assessing the content quality of student written essays and summaries • Published two algorithms in Natural Language Processing conferences, Association for Computational Linguistics (ACL) and Language Resources and Evaluation Conference (LREC) and CLIEDE. Information Processing and Algorithms Laboratory Professor Vishal Monga Independent Research and Honors Thesis • Designed and developed a web application for assembling, training, and assessing the performance of Convolutional Neural Networks (CNNs) in real-time by interfacing a live instance of a TensorFlow model • Displays necessary statistics for evaluating both classification and regression tasks while also leveraging powerful data visualization algorithms to further gain live insights into a training model LEADERSHIP AND INVOLVEMENT Nittany Data Labs University Park, PA Cofounder, Vice President and Director of Training Winter 2016 – Spring 2019 • Provided students from diverse academic backgrounds with the resources to learn more about the field of Artificial Intelligence • Designed a full training program that strengthens students programming abilities, introduces students to various problems and how to solve them with data driven solutions, and offers students unique opportunities to learn how to communicate analytics • Fostered a creative environment for students to propose various projects and learn the skills for using data analytics in practical environments such as consulting, API development, finances, and predictive modeling

PAPERS AND PUBLICATIONS

Y. Gao, A.Warner, R. Passonneau, “PyrEval: An Automated Method for Summary Content Analysis”