A Thesis entitled

A Supervised Machine Learning Approach Using Object-Oriented Programming Principles

by Merl J. Creps Jr

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Masters of Science Degree in Computer Science and Engineering

Dr. Jared Oluoch, Committee Chair

Dr. Weiqing Sun, Committee Member

Dr. Henry Ledgard, Committee Member

Dr. Amanda C. Bryant-Friedrich, Dean College of Graduate Studies

The University of Toledo May 2018 Copyright 2018, Merl J. Creps Jr

This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of A Supervised Machine Learning Approach Using Object-Oriented Programming Principles by Merl J. Creps Jr

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Masters of Science Degree in Computer Science and Engineering The University of Toledo May 2018

Artificial Neural Networks (ANNs) can be defined as a collection of interconnected layers, neurons and weighted connections. Each layer is connected to the previous layer by weighted connections that couple each neuron in a layer to the next neuron in the adjacent layer. An ANN resembles a human brain with multiple interconnected synapses which allow for the signal to easily flow from one neuron to the next. There are two distinct constructs which an ANN can assume; they can be thought of as

Supervised and Unsupervised networks. Supervised neural networks have known outcomes while Unsupervised neural networks have unknown outcomes. This thesis will primarily focus on Supervised neural networks.

The contributions of this thesis are two-fold. One, it offers a comprehensive study of Object-oriented Programming (OOP) design principles that have been employed to design ANNs. Two, it designs and implements a scalable OOP solution to ANNs.

The ANN solution presented in this thesis provides an efficient and accurate statistical prediction model. A Multi-layer feed-forward neural network using back-propagation is used to demonstrate OOP design techniques. The neural network consists of three layers: one input layer, one hidden layer, and one output layer. The input layer consists of two neurons; the hidden layer consists of three neurons; while the output layer consists of one neutron. The neural network utilizes the Sigmoid function as

iii the activation function so that all data points are normalized to values between [0,1].

Compared to two existing models (Encog and Neuroph), the approach in this work produces a more accurate prediction.

iv To my family for their love, support and endless dedication and also in the loving memory of my mother Carol Creps. Acknowledgments

First and foremost, I would like to express my sincere gratitude and appreciation to my advisor Dr. Jared Oluoch for the guidance, patience, motivation and support he provided during the course of my graduate studies. I would like to thank Dr. Henry

Ledgard for the encouragement to purse my graduate studies, and Dr. Weiqing Sun for all the support and guidance that he provided during my studies. I would also like to extend a special thank you to Linda Beal, Richard Springman, Myrna Rudder and Christie Hennen for the exceptional dedication to student success and mentorship that they graciously and willingly provided during my graduate studies. Secondly,

I would like to thank the College of Engineering at The University of Toledo for providing me access to the equipment, laboratory, and supplies that were required for my research.

Most importantly, I would like to thank my wife who has supported me throughout this entire journey. Along with my wife, I would like to thank my children for their understanding and support. Finally, I would like to thank my mother and father for their unwavering love and support.

vi Contents

Abstract iii

Acknowledgments vi

Contents vii

List of Tables xi

List of Figures xii

List of Abbreviations xiii

List of Symbols xiv

1 Introduction 1

1.1 Inheritance ...... 2

1.2 Polymorphism ...... 2

1.2.1 Static or Compile-time Polymorphism ...... 3

1.2.2 Dynamic Polymorphism ...... 4

1.3 Encapsulation ...... 5

1.3.1 Access Modifiers ...... 8

1.3.2 Public Access Modifier ...... 9

1.3.2.1 Protected Access Modifier ...... 10

1.3.2.2 Default Access Modifier ...... 10

1.3.2.3 Private Access Modifier ...... 11

vii 1.4 Abstraction ...... 12

1.4.1 Objects ...... 13

1.4.2 Classes ...... 13

1.5 Neural Network Using Back-propagation ...... 15

1.5.1 Back-propagation with Object-oriented Programming . . . . . 16

1.6 Other Object-oriented Programming Approaches for Artificial Neural

Networks ...... 18

1.7 OOP Neural Network Tools ...... 19

2 Proposed Object-oriented Programming Solution for Artificial Neu-

ral Networks 21

2.1 Artificial Neural Network ...... 23

2.2 Supervised Learning ...... 24

2.3 Unsupervised Learning ...... 24

2.4 Multilayer ...... 24

2.4.1 Classifier ...... 25

2.4.2 Input Layer ...... 26

2.4.3 Hidden Layer ...... 26

2.4.4 Output Layer ...... 27

2.5 Activation Function ...... 27

2.5.1 Sigmoid Function ...... 27

2.5.2 tanH Function ...... 29

2.5.3 Rectified Linear Unit (ReLU) Function ...... 30

2.5.4 Leaky ReLU Function ...... 31

2.6 Gradient ...... 32

2.7 Back-propagation ...... 33

2.7.1 Problem Definition ...... 33

viii 2.8 Neural Network Forwardfeed General Notations ...... 34

2.9 Algorithm ...... 35

2.10 Feedforward ANN with Back-propagation Illustration ...... 37

2.10.1 Incoming Hidden Layer Calculations ...... 38

2.10.2 Applying f(x) ...... 39

2.10.3 Hidden Layer Sum of Products ...... 39

2.10.4 Apply Activation Function ...... 39

2.10.5 Compute Output Margin of Error ...... 39

2.10.6 Compute the Rate of Change ...... 40

2.10.7 Compute Delta Output Weight Changes ...... 40

2.10.8 Compute Delta Hidden Sum ...... 40

2.10.9 Calculate Hidden Layer Incoming Weight Changes ...... 40

2.10.10 Update Incoming Hidden Layer Weights ...... 41

2.11 Contributions ...... 42

3 Results 49

3.1 Framework Setup ...... 49

3.1.1 Learning Rate vs MSE ...... 50

3.1.2 Sample Size vs MSE ...... 51

3.1.3 Max Error vs MSE ...... 51

3.1.4 Pseudo-code ...... 52

4 Conclusion and Future Work 54

References 57

A Artificial Neural Network Java Code 64

B Activation Function Interface and Implementation 68

ix C Neuron Utility Class 73

D Display Utility Class 75

x List of Tables

1.1 Encapsulation Access Modifiers ...... 9

xi List of Figures

1-1 Object Orientated Programming Concepts ...... 2

1-2 Unified Modeling Language Diagram of OOP ...... 12

1-3 Neural Network Model for XOR gate ...... 16

2-1 Multilayer Perceptron Neural Network Model ...... 25

2-2 Linear Separability ...... 26

2-3 Non-linear Separable ...... 26

2-4 Sigmoid non-linearity squashes numbers to range between [0,1] ...... 28

2-5 tanh non-linearity squashes real numbers to range between [-1,1] . . . . . 29

2-6 Rectified Linear Unit (ReLU) activation function, which is zero when x <

0 and then linear with slope 1 when x > 0 ...... 30

2-7 A plot indicating the 6x improvement in convergence with the ReLU unit

compared to the tanh unit...... 31

2-8 Gradient Decent ...... 32

2-9 XOR Neural Network with weighted connections, for input [1,1] . . . . . 42

3-1 Learning Rate vs MSE ...... 50

3-2 Sample Size vs MSE ...... 51

3-3 Maximum Error vs MSE ...... 53

xii List of Abbreviations

ANN ...... Artificial Neural Network DeeBNet ...... Deep Belief Networks DT ...... Decision Trees GRNN ...... General Regression Neural Network GUI ...... Graphical User Interface HDOD ...... High Dimension Omics Data LM ...... Levenberg-Marquardt algorithm LVM ...... Latent Variable Models MLP ...... Multilayer Perceptron (target−actual)2 MSE ...... Mean Squared Error MSE = datapoints NN ...... Neural Network NPC ...... Nasopharyngeal Carcinoma OOP ...... Object-oriented Programming POJO ...... Plain Old Java Object QP ...... Quick Propagation SOM ...... Self-Organizing Map SVM ...... Support Vector Machine ReLU ...... Rectified Linear Unit RP ...... Resilient Propagation ROI ...... Return on Investment TTM ...... Time to Market UML ...... Unified Modeling Language

xiii List of Symbols

f(x) ...... A relation between a set of inputs and a set of permissible outputs f0(x) ...... The derivation of a given function with respects to x

xiv Chapter 1

Introduction

Object-oriented Programming (OOP) is a software design paradigm that utilizes objects to produce meaningful programs. The programs are organized by classes, which specify the behaviour and properties of objects. These behaviour, known as methods, control the actions of the program. Figure 1-1, is a visual representation of the OOP concepts. An Object-oriented program is organized in classes, and one class can have multiple objects. Objects are entities in the real world that can be distinctly identified; for example a smart phone, car, pen, table, and so on. A class is the building block for creating objects of the same type [1]. Figure 1-2 illustrates the relationship between OOP concepts. Consider a cow class. A cow has many objects: legs, eyes, and mouth. A cow class can manipulate its various objects to produce some kind of behavior. This behavior is a method. For instance, a cow class can use the mouth object to make a sound (moo); the eyes objects to see; and the legs objects to walk. The three design principles that undergird OOP are inheritance, polymorphism, and encapsulation. Figure 1-1 illustrates these features.

These features of OOP can be exploited to build Artificial Neural Networks

(ANNs).

1 Figure 1-1: Object Orientated Programming Concepts

1.1 Inheritance

Inheritance is a feature in OOP that makes it possible for a child class or sub-class to take over the attributes and methods of its parent class without having to create an entirely new class. The child class inherits common or generalized features from the parent class but also makes its own specialized features.

1.2 Polymorphism

Polymorphism in one of the core concepts of OOP. It describes a concept in which classes take different shapes because classes can use a common interface. By allowing each class to implement the interface class differently, each class can have its own distinct form. The are two different types of polymorphism: static or compile-time and dynamic Polymorphism[2].

2 • Variables may take on the form of an integer in one context but transform

into a string in another context. Take userId; this may be an integer or string

depending on the context.

• Function or methods can have the same name but have a separate set of formal

parameters and may even return a different data type based upon the context.

• Machine learning language, a data type of “any", such that when specified for a

list, a list containing any data type can be processed by a function. If a function

simply determines the length of a list, it doesn’t matter what data types are in

the list.

1.2.1 Static or Compile-time Polymorphism

This is the mechanism by which a call to an overloaded method is resolved at compile time, rather than at run time. It is best associated with method overloading.

Method overloading is a concept where the method name appears more than once with a different set of parameters. At compile time, the compiler determines which method to call based upon the parameters and parameters type. This is know as static polymorphism/binding or compile-time polymorphism. Listings 1.1 and 1.2 demonstrate Static or Compile-time Polymorphism.

/∗∗ ∗ ∗ @param f i l e ∗ @param inputSize ∗ @param delimiter ∗/ public BufferedDataSet(File file , int inputSize , String delimiter) { super (inputSize); }

/∗∗ ∗ Creates new buffered data set with specified file , input and output s i z e . ∗ Data set file is assumed to be txt value with data set rows in a single line,

3 ∗ with input and output vector values delimited by delimiter. ∗ ∗ @param file datas et file ∗ @param inputSize size of input vector ∗ @param outputSize size of outut vector ∗ @param delimiter delimiter for vector values ∗ @throws FileNotFoundException ∗/ public BufferedDataSet(File file , int inputSize , int outputSize , String delimiter) throws FileNotFoundException { super (inputSize , outputSize);

this .delimiter = delimiter; this .file = file; this .fileReader = new FileReader(file); this .bufferedReader = new BufferedReader(fileReader); fileLinesNumber = countFileLines();

// load first chunk of data into buffer loadNextBuffer() ; } Listing 1.1: Class illustrating overloading

1.2.2 Dynamic Polymorphism

Dynamic or Late Binding Polymorphism is the mechanism by which a call to an overridden method is resolved at run time, rather than compile time. Dynamic or

Late Binding Polymorphism is better know as method overriding. Method overriding can best be illustrated by a super-class containing a method that is overridden by a sub-class. The various version or sub-classes are referenced through the parent class reference variable, where different forms of the methods are executed. Listing 1.2 illustrates dynamic methods of various neural networks. class NeuralNetwork{ public void l e a r n ( ) { System.out.println("NeuraNetwork is learning"); } } class Backpropagation extends NeuralNetwork{ public void l e a r n ( ) { System.out.println("Backpropagation network is learning"); } } class R e s i l i e n t extends NeuralNetwork{ 4 public void l e a r n ( ) { System.out.println("Resilient network is learning"); } } class Runnable{ public static void main(String[] args){ NeuralNetwork nn = new NeuralNetwork() ; nn.learn(); // prints NeuraNetwork is learning Backpropagation bp = new Backpropagation() ; bp.learn(); // Backpropagation network is learning Resilient rs= new Resilient(); rs.learn(); // Resilient network is learning

} } Listing 1.2: Class illustrating Dynamic Polymorphism

1.3 Encapsulation

Encapsulation is one of the core concepts of OOP. It is the technique of binding the parameters within logical units to prevent unnecessary access to parameters and methods within the logical units. Encapsulation can also be thought of as a prevention mechanism, which protects or hides access to the logical unit from other classes.

The access to the logical unit is controlled by the access modifiers (public, private, protected, and no modifier which implies the package is private).

The other critical component of encapsulation is the ability to modify code without disrupting other sections of the code base. Encapsulation provides increased maintain- ability, flexibility and an extendable code base. Another advantage of encapsulation is the increased security that it provides, because the actual implementation is hidden by the access modifiers. Along with security, encapsulation also reduces the coupling with other logical units. The more restrictive the access modifiers are, the less coupled the logical unit is with other logical units. This concept reduces software cost when modifications are introduced because the logical units are less dependent on other logical units. One can think of encapsulation as more of the “How" the logical unit 5 functions rather than the “What" the logical unit needs to accomplish. This is the fundamental difference between abstraction and encapsulation. Listing 1.3 illustrates encapsulation. Notice the various access modifiers used in the DataSetRow class. package org.neuralnet.core.data; import java.io.Serializable; import java. util .Arrays; import java.util.List; import org.common. StringUtil ;

/∗∗ ∗ ∗ ∗/ public class DataSetRow implements Serializable {

private static final long serialVersionUID = 1L;

/∗∗ ∗ Input array for training the Neural Net ∗/ protected double [ ] input ;

/∗∗ ∗ The expected outcome for training the neural network ∗/ private double [] desiredOutput;

/∗∗ ∗ Label for this data set row ∗/ protected S t r i n g label ;

/∗∗ ∗ Creates new training element with specified input and desired output vectors ∗ specified as strings ∗ ∗ @param input ∗ input vector as space separated string ∗ @param desiredOutput ∗ desired output vector as space separated string ∗/ public DataSetRow(String input , String desiredOutput) { this .input = StringUtil.parseDoubleArray(input); this .desiredOutput = StringUtil .parseDoubleArray(desiredOutput); }

/∗∗

6 ∗ Creates new training element with specified input and desired output arrays ∗ ∗ @param input ∗ input array ∗ @param desiredOutput ∗ desired output array ∗/ public DataSetRow( double [ ] input , double [] desiredOutput) { this .input = input; this .desiredOutput = desiredOutput; }

/∗∗ ∗ Creates new training element with specified input and desired output vectors ∗ ∗ @param input ∗ input vector ∗ @param desiredOutput ∗ desired output vector ∗/ public DataSetRow(List input , List desiredOutput) { this .input = StringUtil.toDoubleArray(input); this .desiredOutput = StringUtil .toDoubleArray(desiredOutput); } public DataSetRow(List input) { this .input = StringUtil.toDoubleArray(input); }

/∗∗ ∗ Creates new training element with input array ∗ ∗ @param input ∗ input array ∗/ public DataSetRow( double . . . input ) { this .input = input; }

/∗∗ ∗ Supervised Neural Networks have Known outputs while un−s u p e r v i s e d neural ∗ network has unknown outputs ∗ ∗ @return true is Supervised, false is not supervised ∗/ public boolean isSupervised() { return (desiredOutput != null ); } public double [] getInput() {

7 return input ; }

public void setInput ( double [ ] input ) { this .input = input; }

public double [] getDesiredOutput() { return desiredOutput ; }

public void setDesiredOutput( double [] desiredOutput) { this .desiredOutput = desiredOutput; }

public String getLabel() { return label ; }

public void setLabel(String label ){ this . label = label ; }

@Override public String toString() { StringBuilder builder = new StringBuilder(); builder .append("DataSetRow [input="); builder .append(Arrays.toString(input)); builder .append(", desiredOutput="); builder .append(Arrays. toString(desiredOutput)); builder.append(", label="); builder .append( label ); builder .append("]"); return builder.toString(); } } Listing 1.3: Java program to illustrate encapsulation

1.3.1 Access Modifiers

Access Modifiers are the controlling features that allow access to the various pack- ages, classes, methods, and parameters in OOP. Theses access modifiers attributes are the cornerstone of the OOP concepts and are the implementation layer of encap- sulation. There are four access modifiers and they are as follows: default (no modifier specified), public, private, and protected. Table 1.1 defines the protocols for the four

8 access modifiers[3].

Table 1.1: Encapsulation Access Modifiers Modifier Class Package Subclass World public YYYY protected YYYN default YYNN private YNNN

1.3.2 Public Access Modifier

A public access modifier is specified using the keyword public. There are no re- striction placed upon any Class, method or parameter defined using the public mod- ifier and the item may be accessed from anywhere within the entire application. The public access modifier is said to have the broadest scope because of the accessibility of the Classes, methods and parameters. Listing 1.4 illustrates a public modifier. package ANN; class Network{ public void l e a r n ( ) { System.out.println("Network is learning ..."); } } package NeuralNetwork ; import ANN. Network ; class Backpropagation{ public static void main(String args[]){ Network network = new Network ( ) ; network.learn(); } } Listing 1.4: Java program to illustrate public modifier

9 1.3.2.1 Protected Access Modifier

Protected access modifier is specified using the keyword protected. Any param- eters or methods defined as protected is only visible from with any subclass within in the application or within the same packaging or folder. The method startEngine() in class Automobile is protected and class GMC is inherited from class Automobile.

The protected method is then accessed by creating an object of class GMC. Listing

1.5 illustrates a protected access modifier. package Auto ;

//Class Automobile public class Automobile{ protected void startEngine(){ System.out.println("Protected Modifier ... Starting Automobile"); } } package GeneralMotors ; import Auto.Automobile;

//Class GMC is subclass of Automobile class GMC extends Automobile{ public static void main(String args[]){ GMC truck = new GMC( ) ; truck.startEngine(); } } Listing 1.5: Java program to illustrate protected modifier

1.3.2.2 Default Access Modifier

Any Class, method or parameter which does not have an explicit access modifier defined, will by omission be defined as having default access. Default Access can be thought of as locally having scope or access within the same package or folder. Listing

1.6 illustrates a default modifier. package ANN; class Network{ void l e a r n ( ) { System.out.println("Network is learning ..."); 10 } }

package NeuralNetwork ; import ANN. Network ;

//This class is having default access modifier class Backpropagation{ public static void main(String args[]){ Network network = new Network ( ) ; network.learn(); } } Listing 1.6: Java program to illustrate default modifier

1.3.2.3 Private Access Modifier

Private access modifier is specified using the keyword private. Methods or pa- rameters which are defined as private, do not have access outside of the Class in which they are defined. Private methods and parameters, are only accessible within the class where they are instantiated. The private access modifier may not be used to define Classes nor interfaces. Listing 1.7 illustrates a private access modifier.

package ANN; class Network{ private void l e a r n ( ) { System.out.println("Network is learning ..."); } }

package NeuralNetwork ; import ANN. Network ;

class Backpropagation{ public static void main(String args[]){ Network network = new Network ( ) ; network.learn(); /∗ This line errors because the method is p r i v a t e ∗/ } } Listing 1.7: Java program to illustrate private modifier

11 1.4 Abstraction

Abstraction is another of the core concepts in the ideological paradigm of OOP and may be considered as the process of removing or abstracting the concrete imple- mentation of the actions to the classes or objects. Abstraction can be thought of as the “Why" the classes act in a certain manner. Abstraction may also be thought of as making the classes more general, thus allowing for more implementation of the gener- alized classes. An example of Abstraction is the neural network makes its prediction but we have no idea as to what happens in the network. This differs from Encapsu- lation, because encapsulation is the how the neural network makes the predictions.

Figure 1-2 demonstrates the concepts of abstraction.

Figure 1-2: Unified Modeling Language Diagram of OOP

12 1.4.1 Objects

Objects are the self-contained, fundamental building blocks in OOP with a great emphasis placed on the applications design and functionality. An object contains all the necessary attributes and methods that make the object useful when designing so- lutions to complex problems. An object’s attributes are all the pertinent information that describe the functionality of the object.

1.4.2 Classes

Classes are a fundamental part of object-orientated programming and are used to describe one or multiple objects. Classes provided concrete implementation for instantiating specific objects within the application. An object is derived from a single class, but can instantiate multiple objects during the application. While classes are at the core of OOP, they must be instantiated as objects to be used by any application.

Classes allow for a layer of encapsulation, which isolates variables and methods for specific instantiated objects. This encapsulation protects each class from modifica- tions from other classes or by the software engineer in other areas of the application.

By using classes, software engineers can create well-organized codes which can easily be maintained. Listing 1.8 illustrates a source code for a neural network exception class. public class NeuralNetException extends NetworkException{

private static final long serialVersionUID = 1L;

public NeuralNetException(){

}

public NeuralNetException(String message){ super ( message ) ; }

public NeuralNetException(String message, Exception e){ super (message, e); 13 }

public NeuralNetException(Throwable cause){ super ( cause ) ; }

public NeuralNetException(Throwable cause , String message){ super (message, cause); } } Listing 1.8: A Neural Network Exception Class

This thesis offers a comprehensive survey of Object-oriented Programming (OOP) design principles that have been used to design Artificial Neural Networks (ANNs).

The thesis discusses in details the three fundamental principles of OOP: inheritance, polymorphism, and encapsulation. In addition, the thesis uses examples to illustrate how ANN can be constructed using OOP design principles. Moreover, different types of access modifiers are discussed, and examples of method overriding and method overloading are provided. Finally, the thesis explains how back-propagation is used in neural networks with specific mathematical illustrations. What sets this work apart from existing literature is that it goes deep into explaining OOP principles,

ANNs, and then reviews papers that have utilized OPP for designing neural networks, pointing the key features and weaknesses, if any, of the papers reviewed. The key contribution of this thesis is the design and implementation of a scalable ANN using

OOP approach.

The remainder of this thesis is organized as follows. Section 1.5 explains ANN de- sign using back-propagation. Section 1.5.1 describes back-propagation in ANN using

OOP. Section 1.6 discusses other OOP approaches for ANN. Section 1.7 describes ex- isting OOP tools for ANNs. Chapter 2 presents our proposed OOP solution for ANN and explains the key contributions of this thesis. Chapter three presents the results of our work. Chapter 4 concludes our work and points to areas of future research.

14 1.5 Neural Network Using Back-propagation

There are three types of neural networks: 1) Biological Neural Networks found in living organisms, 2) Simulated Artificial Neural Networks that model neural networks with electronics, and 3) Artificial Neural Networks (ANNs) [4]. ANNs are made up of three layers: hidden layer, output layer, and input layer as illustrated in Figure

2-1. ANNs are inspired by the human neurological system. All neurons are connected together at each layer to allow them to learn the environment so they can produce accurate prediction models. The prediction models are premised on how the ANN learns from the input data to identify parameters of the desired model [5]. When many parameters are used in ANNs, the learning process slows down [6]. To reduce the learning time, software and hardware solutions may be implemented [5] such as the ones proposed by [7, 8, 9] and [10]. The encoded knowledge is weighted and shared throughout the system [11]. ANN based approaches have been used to help in detection of breast cancer [12, 13, 14, 15, 16, 17, 18], prostate cancer [19, 20], cervical cancer [21], brain cancer [22], lung cancer [23], and Alzheimer’s [24].

Back-propagation is a concept used in supervised deep neural networks to compute the error contribution of each neuron after a set of signal or inputs has been forward- fed through the network. Back-propagation uses a highly specialized mathematical model to compute the Mean Squared Error (MSE) for the output signals.

The MSE is calculated on the output layer and then iteratively distributed back through each layer’s weighted connections. This backwards flow allows for corrections or adjustments to the mathematical model (weighted connection between the neurons) to compute the final once the error correction is significantly reduced or within an acceptable range. This process is also know as the general case of the MSE to multi- layered feed-forward because the model computes the gradient cost for each layer on the neural network. Figure 1-3 illustrates an XOR Deep Neural Network.

15 Figure 1-3: Neural Network Model for XOR gate

1.5.1 Back-propagation with Object-oriented Programming

The central concept in [25] is to use an OOP concept to analyze design pat- terns in structural steel beam using a neural network. The authors describe the five components of their algorithm as: learning domain, neural nets, library or learning strategies, learning process, and analysis process. They used three layers in their back-propagation neural net topology: an input, hidden, and output layer. They analyzed the inputs and broke them into their respective parts and provided some basics statistical data on how these values were used within the network. The paper points out that neural networks can be applied to both simple and complex problems.

It mentions that complex problems require an increased execution time, and seem to confirm that back-propagation creates an effective learning environment. However, since this article was written in the early 1990s, recent advancements in computation power will greatly reduce the learning time required for complex problems. The pa-

16 per has two main weaknesses. First, the authors do not describe how they used OOP principles within the C++ environment to create the neural net. Second, the authors do not provide code examples or pseudocode to demonstrate how they accomplished their experiments.

ANN have been used for software quality prediction [26]. The central idea in the article is the correlation between the software quality by estimating the number of faults by the number of lines changed per class. The authors introduced the Ward neural network or the backpropagation neural networks and the General Regression neural network (GRNN) or non-linear neural networks. A ward neural network is a backpropagation network with different activation functions at each layer while the

GRNN is a one pass network using memory to solve non-linear classifications using continuous variables. The main purpose of the paper was to correlate the number of defects within the code based on the complexity of the code. For example, via coupling, the more a class is instantiated, the more likely a defect will occur at some point. The authors assume that more coupling correlates to more defect. While this could be the case, API frameworks such as Apache have low tolerance to defects. The authors also used the Mean Squared Error (MSE) to try and make some correlation between the number of lines of code and the number of defect using the various neural network. The author sanitized the data for outliers and incorrect data. They also used a 3:1:1 ratio for training, cross-training, and actual data. They described each layer, its neuron count, activation function, and the number of layers in the network.

Overall, this paper was interesting but lacked the definition of how the network was implemented. They mainly focused on how they applied the networks to do the analysis of the code.

17 1.6 Other Object-oriented Programming Approaches

for Artificial Neural Networks

A C++ library for neural networks is presented in [27]. The paper discussed why C++ approach was better than the traditional procedural approach. The au- thors stated C++ approach gave them the best of both worlds - the procedural and object approaches. C++ allowed them to use both procedural and object-oriented approaches to combine their architecture into a single library. The authors described and outlined the goals and characteristics of NEURObjects library and its Object- oriented approach to simplifying neural network coding using C++. The article dis- cussed the three major topics of OOP and provided concrete examples in C++. Along with these main concepts the authors described ancillary utilities that are required to support NEURObjects. They described such things as support class, Input Output classes, data generation classes, and network building classes. This article has several strengths such as concrete examples of how the OOP design was implemented, how it is more efficient, and less costly to the developers. The major drawback of this paper is that is was designed before more modern OOP methodologies were implemented.

A more modern design may require the software engineer to make some major archi- tectural changes to the NEURObject library. In [28], the authors discussed objects, abstraction and polymorphism in conjunction with an OOP approach to neural net- work using C++. The authors also discussed the concept of Dynamic stop conditions, different processor and the various tests results they used to draw their conclusion.

The authors introduced multi-threading in combination with neural networks. They used various type of neural networks such as quick propagation(QP), resilient propa- gation (RP) and Levenberg-Marquardt (LM) algorithm. They attempted to split the training data into N folds, and ran the training data through the network on differ- ent threads, combining the errors once training was completed. They used various

18 criteria to determine if the training was completed and these criteria are as follows:

number of epochs, 6 iterations with no change, and a MSE less than 10−9.

Zheng [29] describes encapsulation, inheritance, and abstraction within the con-

fines of OOP paradigm. He introduces a fuzzy object concept which can be described as words of linguistics that represent data. The authors in [25] use an OOP approach to solve structural beam analysis. They use the OOP approach in C++ and G++ to perform the complex computations required in ANNs to solve pattern recognition problems.

1.7 OOP Neural Network Tools

Prior work has been done to implement neural networks using Object-oriented tools. An Object-oriented Matlab toolbox for Deep Belief Networks (DeeBNet) is introduced in [6]. The toolbox contains two packages, classes, and functions for data management. The abstract class defines the methods. The second class is used for data modification. The proposed toolkit can be used for data generation for ANNs.

In the C++ architecture proposed by [4], nodes and links are used as the base classes where common characteristics are organized. Each child class inherits from the base class and exploits the features of inheritance. The input node and feed forward node both inherit from the base class.

Rao [30] exploits the Object-oriented features of C++ to construct a neural net- work. In the network, four neurons are connected to one another. The network class and neuron class hold the four neurons. The functions for activating the neurons are public so they can be visible and accessed without restriction. Other existing work that implement ANN using OOP principles are found in [31] and [32].

The authors of [33] present a tool that simplifies neural networks so individuals with little or no experience can understand and use the complexity of the network to

19 solve non-linear and linear problems. They describe the various types of networks:

Multi-Layer Perceptron (MLP), Self-Organizing Map (SOM), and Latent Variable

Models (LVM). In addition, their paper describes supervised and unsupervised net- works and their differences. They describe how they used OOP with C# programming language to develop a learning tool which visualizes the learning of neural networks.

The central idea of [34] is to combine ANNs with a simulation to solve and in- crease the efficiency of smart manufacturing systems. The authors use a Multilayer

Perceptron (MLP) ANN with 1 input layer with 4 nodes, 1 hidden layer with 43 nodes and a single output layer, with number of nodes unknown. The authors describe an algorithm for forward-feed MLP with back propagation. The authors choose a MLP

ANN because of the success of MLPs and their characteristics. MLPs are good at handling non-linear data points. The authors also describe the modeling of MLP, which are as follows: a set of weight connections that flows forward from the inputs, through an activation function, which is directly transmitted to the output. The Sum of the errors is then fedback through the network to update the weight connections.

The process continues until the targeted output or desired error percentage is reached.

This network would then be considered taught or trained. The greatest strength of this paper is the descriptive nature in which the authors describe why and how the

ANN MLP network would be used to simulate smart systems.

In [35] an OOP methodology is used to model an ANN. The paper utilizes a

Model View Controller (MVC) approach and implements key OOP principles such as data segregation and abstraction, logical evaluation and functional abstraction, and the abstraction of the results from the data and functional layer. A Graphical User

Interface (GUI) for ANN is presented in [36]. This graphical package can be used to model a neural network without the user being a software engineer.

20 Chapter 2

Proposed Object-oriented

Programming Solution for Artificial

Neural Networks

Object-oriented approach to machine learning is not a new phenomenon. Real world objects are represented in machine learning using idealization and abstraction

[37]. Abdrabou and Salem [38] designed an object-oriented framework for cancer diagnosis. The framework maps all concepts onto classes and interfaces. It includes

Java classes and a number of XML files. The framework is then organized around tasks and methods. In their work, [39] use objects to represent tissue components in colon biopsy images. Results of their work demonstrate a 94.89% accuracy. In their study, [40] applied an object-oriented regression methodology to analyze High

Dimension Omics Data (HDOD) to predict prognosis outcome.

Machine learning employs statistical techniques to learn from past examples to predict patterns from large data sets [41]. These techniques correlate the training input size to the actual input size to compute cancer predictions and prognosis [42].

Zhang et al [43] utilized radiomics-based prediction of local failure and distant failure in advanced nasopharyngeal carcinoma (NPC). Yu and Sun [44] propose a sparse cod-

21 ing algorithm that maps the input feature to a sparse representation. Results of their work demonstrate a more accurate classification than existing solutions. Moorthy et al., [45] harvested carcinogenic and mutagenic information of 1481 chemically diverse molecules as their classification models. They correctly classified more than 70% of compounds in the test set.

Imbus et al., [46] used machine learning and fuzzy-c-means clustering to identify multigland disease in primary hyperparathyroidism among a large data set of patients.

They used a boosted tree classifier that had a 94.1% accuracy, 94.1% sensitivity, 83.8% specificity, and 94.1% positive predictive value. In their study, [47] investigated the impacts of machine learning and ANN for tumor detection of lung cancer. Their

finding demonstrated that the framework can be used to identify tumor regions in cancer patients. In their work, [48] adapted a Gaussian process technique to evaluate the accuracy, specificity, sensitivity, positive predictive value, and negative predictive value of diabetes.

Jianfu et al., [49] used an ultrasound-based machine learning approach to differ- entiate between benign and malignant thyroid nodules. Results of the study achieved

78.89% sensitivity and 94.55% specificity. For machine learning to produce mean- ingful results, the techniques must be used appropriately [50] because they play a significant role in medical imaging and diagnosis for cervical cancer [21], nasopharyn- geal carcinoma [51], liver cancer and diabetes [52].

Existing literature discuss the collections methods and treatment of the specimens after collection but not the concrete implementation of the algorithm used to make any predictions. Various learning algorithms such as Decision Trees (DT), Support

Vector Machine (SVM), SOM (self-organized maps) unsupervised k-maps, and Naive

Bayes have been used in respective prediction, survivability and recurrence models.

Existing literature compare a wide variety of algorithms using back-propagation to compute performance indicators. While the literature discusses and describes the

22 algorithms, to the best of our knowledge, concrete implementations of each of the algorithms is missing.

Our work differs from existing literature in that we demonstrate that concrete implementations of the Forward Feed Neural Network with back-propagation can be successfully designed and rapidly changed to handle a wide variety of ANN algorithms.

We demonstrate that Object-oriented design can and will be successful in ANNs.

2.1 Artificial Neural Network

An Artificial Neural Network (ANN) is a system of interconnected neurons inspired by biological neural systems, such as the human neurological system. Each ANN is made up of weighted connected neurons at each layer that collectively make up the entire neural network. The neural network was intended to work like the human brain where each neuron represents states, usually a value between 0 and 1. The neurons and connections act together usually with weights to assist with the learning rate to derive the function. Neural networks have three layers: input layer, hidden layer, and output layer.

Machine learning is a sub-discipline of Artificial Intelligence in computer science which allows a computer to learn using input signals and advanced mathematical concepts to provide accurate and precise prediction models. Machine learning allows a computer to learn without being programmed to provide predictive analytics. These analytics provide valuable data to big data scientists who use the data to provide repeatable results multiple times. Machine learning is divided into two categories: supervised and unsupervised.

23 2.2 Supervised Learning

Supervised learning is accomplished using a set of known inputs or training data.

The training data is used to effectively derive the algorithm or function. Once the algorithm is sufficiently trained, the unknown data set may be examined using the function to accurately predict the desired outcome. The outcome labels can be ana- lyzed for precision and accuracy because the input data was known.

2.3 Unsupervised Learning

Unsupervised learning is accomplished by inferring the function or algorithm for an unknown input data set. In this type of learning, both the data inputs and outputs are unknown and the function must determine the correct response. Unsupervised learning algorithms solve complex problems with just the input data.

2.4 Multilayer Perceptron

A Multilayer Perceptron (MLP) is a forward feed network, meaning the signal

flows in a single forward direction from the input layer to at least one hidden layer and then to the output layer until the network is sufficiently trained. A MLP consists of a minimum of three layers of connected neurons which use a non-linear activation or transfer function to train the network. Multilayer perceptron neural networks are generally used to solve problems that involve pattern classification, image recognition, and prediction computations. Figure 2-1 illustrates a multilayer perceptron neural network.

24 Figure 2-1: Multilayer Perceptron Neural Network Model

2.4.1 Classifier

Classifiers are the concrete implementation of mathematical models, functions or algorithms in supervised neural networks to map the inputs into categories. The classifiers are used in conjunction with training data which is used to train the network for precise and accurate observation as to whether the output data is present or absent in the set of mapped input.

Classifiers can be separated into two distinct categories: linear and non-linear. A linear classifier can be thought of as data which can be separated by a single plane. An example is positive and negatives numbers. A single plane can be drawn between the points which separates them into two distinct groups: positive and negative values.

A non-linear classifier cannot simply separate the data points by a single plane but instead must use multiple planes to organize and group the data. The rest of the work described in this paper focuses on non-linear classifiers. Figures 2-2 and 2-3

25 represent linear and non-linear classifiers respectively.

Figure 2-2: Linear Separability Figure 2-3: Non-linear Separable

2.4.2 Input Layer

Input layer is the first layer in the neural network and is the layer that represents the input data set. The input layer is a passive layer as this layer does not modify any data but rather passes the data onto the hidden layer while considering the weights on each neuron’s synopsis. In a forward feed network, the value or signal sent to each hidden node is the summation of all the connections multiplied by the randomized weights placed on each of those connections.

2.4.3 Hidden Layer

Hidden layer is the layer where the activation function is applied and computes the output. The hidden layer is an active layer, meaning the neurons modify the signal sent to them from the input streams, and then pass the modified signal to the output layer.

26 2.4.4 Output Layer

Output Layer is the threshold that provides an indicator as to whether the output is present or absent in the input data. The output layer results are used to compute the error, which is in-turn back-propagated to each input connection.

2.5 Activation Function

Activation functions are mathematical models used to determine if outside factors should be considered when connections are made. The activation function resides at the hidden layer and is central to the Artificial Neural Network’s ability to make sense of non-linear complex problems such as facial recognition. The activation function’s main objective is to compute the input value by applying the sum of products on inputs (X) and the corresponding Weights (W) and apply an activation function f(x) to get the output value for that layer and then forward-feed the value to the next layer in the ANN. Below are some commonly used activation functions.

2.5.1 Sigmoid Function

Sigmoid function is a non-linear function that take real values and converts them to values between [0,1]. This function has seen tremendous usage because it accurately represents the firing of neuron by representing large negative numbers as zero and large positive numbers as one. The Sigmoid function has major drawbacks which are sigmoidal saturation and back-propagation gradient decent as shown in Figure 2-4.

The Sigmoid saturates at either end of the curve as the values approach the

finite limits of the derivative. The gradient in these region is nearly zero so no weighted signal flows through the neuron and ultimately recursively back through each input-output pairs. Randomizing the weights on the connection should be taken into consideration as large weighted connection will saturate the neuron and the 27 network will slowly learn or barely learn. Hence, sigmoidal saturation along with the back-propagation may cause the neural network to learn at a slow rate or not learn at all.

1 θ(x) = (2.1) 1 + e−x

Sigmoid Function 2.8

Figure 2-4: Sigmoid non-linearity squashes numbers to range between [0,1]

28 2.5.2 tanH Function

tanh is a non-linear, non-zero centered function like the sigmoid function. Real numbers are used in the tanh with the values ranging from [-1,1]. The tanh function has the same limitations as the sigmoid function with respect to saturation near the limit values [-1,1]. The tanh function is preferred over the sigmoid function because it is not centered around zero. Since the function in not zero-centered, the only value that can possibly be zero, is zero itself. This means the tanh function is less likely to have the same undesired effects as the Sigmoid function during the training of the neural network. Figure 2-5 shows a tanh non-linear squash of real numbers to range between [-1, 1].

sinh(θ) e − e−z tanH(θ) = = (2.2) cosh(θ) e − e−z

tanH Function 2.8

Figure 2-5: tanh non-linearity squashes real numbers to range between [-1,1]

29 2.5.3 Rectified Linear Unit (ReLU) Function

Rectified Linear Unit (ReLU) is a non-linear function shown in Fig. 7. The

ReLU has been shown to operate at an increased rate compared to the sigmoid and tanH transfers functions. However, the ReLU does have one problem which is caused by larger gradients iteratively sent through the network. These large gradients can sometimes cause the node or neuron to incorrectly stimulate which causes a loss of signal. This loss of signal paralyses the network by not passing the signal through the other layers neurons for each of the input-output pairs. To reduce this paralysis, one must select a proper learning rate which can reduce this effect. Figure 2-6 illustrates the ReLU function.

f(x) = max(0, x) (2.3)

Rectified Linear Unit (ReLU) 2.8

Figure 2-6: Rectified Linear Unit (ReLU) activation function, which is zero when x < 0 and then linear with slope 1 when x > 0

30 2.5.4 Leaky ReLU Function

Leaky ReLU is a form of the ReLU and introduces a small slope which attempts to correct the loss of firing on neuron or “dying neurons". The Leaky ReLU introduces a small negative slope. More studies are needed to determine if the slope has a positive impact as the results are not always conclusive. Figure 2-7 [53] represents the improvement from the Leaky ReLU function.

f(x) = 1(x < 0)(θx) + 1(x >= 0)(x) (2.4)

Leaky Rectified Linear Unit (ReLU) 2.8

Figure 2-7: A plot indicating the 6x improvement in convergence with the ReLU unit compared to the tanh unit.

31 2.6 Gradient

The back-propagation process uses two contrasting functions to derive the newly computed weights for each of the interconnected nodes on the neural network. The two distinct functions are the gradient decent (Figure 2-8) and cost function. The gradient decent is used to minimize the cost function for all the incoming signals for each connected node in the network. The cost function used for this research will be the Mean Squared Error (MSE).

Figure 2-8: Gradient Decent

32 2.7 Back-propagation

ANNs use a concept called back-propagation to calculate each neuron’s error con- tribution after consuming a single set of input signals. Back-propagation is used in conjunction with supervised neural networks because back-propagation requires known, desired output signal for each set of input signals. This process uses a concise algorithm to modify the weights for each neuron after completely training the ANN.

Back-propagation computations are a general case of the Mean Square Error

(MSE) to multi-layered feed-forward networks, which are made possible by comput- ing the gradient cost for each layer on the neural network. The back-propagation algorithm has been repeatedly engineered and is a highly specialized mathematical model which is related to the Gauss-Newton algorithm.

Back-propagation is synonymous with deep learning, as deep learning is a neural network that consists of multiple hidden layers. Also, it is denoted as backward propagation because the MSE is calculated on the output layer and then distributed or transferred back through each layer’s weighted connections.

2.7.1 Problem Definition

• Dataset consisting of input-output pairs (~xi, ~yi), where ~xi is the input and ~yi is

the desired output of the network on input ~xi. The set of input-output pairs

denoted X = (~xi, ~yi),..., ( ~xN , ~yN ).

• Feedforward neural network, parameters are denoted θ.

k • In backpropagation, the parameters of primary interest are as follows: wij, the

weight between node j in layer lk, node i in layer lk−1, and the bias for node i

in layer lk.

• An error function, E(X, θ) defines the error between the desired output ~yi and

33 ˆ the calculated output ~yi of the ANN on the input ~xi for a set of input-output

pairs (~xi, ~yi) ∈ X.

Training a neural network with a gradient descent requires the calculation of the

k k MSE function E(X, θ) with respect to the weights wij and biases bi . Then, according to the learning rate α, each iteration of gradient descent updates the weights and

biases according to formula 5.

∂E(X, θt) θt+1 = θt − α (2.5) ∂θ

where θt denotes the parameters of the neural network at iteration t in gradient descent.

2.8 Neural Network Forwardfeed General Notations

k k • wij: weight for node j in layer lk for incoming node i bi and the bias for node i

in layer lk

k • ai : product sum plus the bias for node i in layer lk

k • oi : output node for i in layer lk

• rk: number of nodes in layer lk

The MSE is the error function used in back-propagation.

N 1 X 2 MSE = (y ˆ − y ) 2.8 (2.6) 2N i i i=1

34 where y is actual value for the input-output pairs (~xi, yi) and yˆi is the calculated output for input ~xi.

2.9 Backpropagation Algorithm

• Forward feed process computes each of the input/output signals ( ~xd, yd) and

k k stores the results yˆd, aj , and oj for each node j in layer k by proceeding from the input layer 0, to output layer m.

• Backpropagation computes for each input-output pair ( ~xd, yd) and store the

∂Ed k results ( k ) for every connected node i and each weight wij in layer k − 1 to ∂wij node j in layer k by proceeding from layer m, the output layer, to layer the

input layer.

1. Evaluate the error term for the final layer.

2. Backpropagate the error terms for the hidden layers working backwards

from the final hidden layer.

k 3. Evaluate the partial derivatives of the individual error with respect to wij.

• The gradients will need to be combined for each data pair to get the total

∂E(X,θ) gradient ( k ) for all input-output signals, X = ( ~x1, y1),..., ( ~xN , yN ) by ∂wij using the average of the individual gradients.

• Update the weights according to the learning rate alpha and total gradient

∂E(X,θ) ( k ). ∂wij

The XOR example described below will prove that the XOR gate can be solved using the Object-oriented principles using the sigmoid function.

The derivation of the sigmoid function is as follows and will show the proof for the following equation is correct and factual. 35 d e−x f(x) ∗ (1 − f(x)) = = (2.7) dx (1 − e−x)2

1 f(x) = 1−e−x

d −x −1 dx = 1(1 − e )

d −x −2 dx = −(1 − e )

d −x −2 −x d dx = −(1 − e ) ∗ (0 + e dx (−x))

d −x −2 −x dx = −(1 − e ) ∗ (0 + e ∗ 1)

d e−x dx = (1−e−x)2

e−x (1−e−x)2 = f(x) ∗ (1 − f(x))

d e−x+1−1 dx f(x) = (1+e−x)2

d 1+e−x 1 dx f(x) = (1+e−x)2 − (1−e−x)2

d 1 1 dx f(x) = 1+e−x − (1+e−x)2

d 1 1 1 dx f(x) = 1+e−x − 1+e−x ∗ 1+e−x

d 1 1 1 dx f(x) = 1+e−x ∗ [ 1+e−x − 1+e−x ]

36 2.10 Feedforward ANN with Back-propagation Illus-

tration

The following example illustrates a feedforward neural network with backpropaga-

tion calculations for the input signals [1, 0] with weights connection values initialized at {.8,.4,.3}, {.2,.9,.5} and {.3,.5,.9}.

For each neuron on each layer, the algorithm must compute the sum of all weighted

connections from all incoming neurons. So, for the first (top neuron in Figure 2-9),

the algorithm must compute the sum of inputs pairs [1,0] with weights [.8, 2]. This

Plk must be done for each neuron, in each layer. This may be computed j wi. Refer to Neural Network feedforward General formulas

1. I1 - denotes Input for the first neuron.

• WI1n - denotes the weighted connection values for the first neuron

2. I2 - denotes Input for the first neuron.

• WI2n - denotes the weighted connection values for the second neuron

3. Hn - denotes hidden neurons.

• WH2n - denotes the weighted connection values for corresponding the hidden neurons

4. Neural Network Feedforward General Notations

k k • wij: weight for node j in layer lk for incoming node i bi and the bias for

node i in layer lk

k • ai : product sum plus the bias for node i in layer lk

k • oi : output node for i in layer lk

37 • rk: number of nodes in layer lk

Plk • Generalized feedforward formula j wi with respect to f(x) ∗ (1 − fx(x)).

1 f(x) = (2.8) 1 + e−x

Sigmoid Function 2.8

d e−x f(x) ∗ (1 − f(x)) = = (2.9) dx (1 − e−x)2

Derivation of Sigmoid Function 2.8

2.10.1 Incoming Hidden Layer Calculations

Hidden layer from the Input Layer is calculated as follows

1. WI1

k • i = 1, wij = .8, yields 1 ∗ .8 = .8

k • i = 0, wij = .2, yields 0 ∗ .2 = .0

• yields .8 + .0 = 8

2. WI2

k • i = 1, wij = .4, yields 1 ∗ .4 = .4

k • i = 0, wij = .9, yields 0 ∗ .9 = .0

• yields .4 + .0 = .4

3. WI3 38 k • i = 1, wij = .3, yields 1 ∗ .3 = .3

k • i = 0, wij = .5, yields 0 ∗ .5 = .0

• yields .3 + .0 = .3

2.10.2 Applying f(x)

Applying f(x) to the three hidden neurons sums where:

f(WH1) = 0.731059

f(WH3) = 0.785835

f(WH3) = 0.689974

2.10.3 Hidden Layer Sum of Products

Sum the product of the hidden layer results with respects to the hidden-output weight

f(WH1) ∗ WH1 = 0.731059 ∗ 0.3 = .219317

f(WH3) ∗ WH2 = 0.785835 ∗ 0.5 = .392917

f(WH3) ∗ WH2 = 0.689974 ∗ 0.9 = .620976

2.10.4 Apply Activation Function

Apply the activation function to get the final output result

f(x) = f(1.233212) = 0.774380

2.10.5 Compute Output Margin of Error

Compute margin of error for the output (delta output sum) by taking the target

- actual 39 DeltaOuputSum = 0 − 0.774380 = −0.774380

2.10.6 Compute the Rate of Change

Compute the rate of change f0(x) to compute the slope

RateOfChange = S0(1.233212) ∗ −0.774380 = −0.135296

The RateOfChange is used in the derivative of the output sum function to de- termine the change in weights. The output sum is the product of the hidden layer results and the weights between the hidden and output layers.

2.10.7 Compute Delta Output Weight Changes     0.731059 −0.185069           DeltaWeightChange = −0.135296 / 0.785835 = −0.172169         0.689974 −0.196089       0.3 −0.185069 0.114931             UpdatedOuptutWeights = 0.5 + −0.172169 = 0.327831             0.9 −0.196089 0.703911

2.10.8 Compute Delta Hidden Sum       0.3 0.8 −0.102010               Delta = −0.135296 / 0.5 ∗ f0(0.4) = −0.068746             0.9 0.3 −0.038860

2.10.9 Calculate Hidden Layer Incoming Weight Changes

Next we need to calculate the change in weights between the input and hidden

layer by dividing it with the input data

40       −0.102010 −0.102010 0.0     1.0     WtChanges =   / −0.068746 = −0.068746 0.0       0.0     −0.038860 −0.038860 0.0

2.10.10 Update Incoming Hidden Layer Weights

Lastly, update the hidden layer connection weights       0.8 0.2 −0.102010 0.0 0.697990 0.200000             UpdatedWts = 0.4 0.9 + −0.068746 0.0 = 0.331254 0.900000             0.3 0.5 −0.038860 0.0 0.261140 0.500000

The above calculations are a representation with respect to inputs [1,1] in figure

2.9. Each of the manual computations described above is the pragmatic computation done when a neural network is used with two input neurons, three hidden neurons, and one output neuron. Figure 2.9 shows the path each input pair will follow during the training and process of the neural network.

Listing 2.9 shows how the neural network would compute and iteratively distribute the gradient function to each weight inter-connected neuron. The algorithm below is illustrated by each of the manual computations described above. This process is repeated for each input pair until the neural network is trained satisfactorily. Once the network is trained the iterative processing of the data set can begin.

@Override public double getOutput ( double weight ) { logger.debug("getOutput: {}", new Object[]{weight}); double den = 1.0 + Math.exp(−this . s l o p e ∗ weight ) ; this .output = (1.0 / den); return this . output ; }

@Override public double getDerivative( double net ) { logger.debug("getOutput: {}", new Object [ ] { this . output }) ; double derivative = this . s l o p e ∗ this . output ∗ (1d − this . output ) + 0 . 1 ;

41 return derivative; } Listing 2.1: Sigmoid Derivative

Figure 2-9: XOR Neural Network with weighted connections, for input [1,1]

2.11 Contributions

The research proves that modern programming practices and principles such as

OOP, inheritance, and polymorphism reduce the time to market (TTM), code com- plexity, and maintainability of the source code while allowing maximum flexibility.

The code complexity is reduced because the code makes use of re-usable code blocks which is a cornerstone of OOP. An example of the re-usability is the following sample of code that shows how a neuron can be added to a layer. The example is adding a neuron to the layer while still maintaining the layer’s integrity. The

42 addNeuron(Neuron neuron) method is a re-usable OOP java based method that allows

for adding neurons to the various layers in the neural network. After testing is

complete for the addNeuron() method, it may be used over and over again without

a complete testing process. This blackbox or compartmentalized testing enables the

product to be developed with greater efficiency and accuracy.

/∗∗ ∗ Adds specified neuron to this layer ∗ ∗ @param neuron neuron to add ∗/ public final void addNeuron(Neuron neuron) { // prevent adding null neurons i f ( neuron == null ){ throw new IllegalArgumentException("Neuron cant be null."); }

// set neuron’s parent layer to this layer neuron. setParentLayer( this );

// add new neuron at the end of the array neurons.add(neuron); } Listing 2.2: sourcecode/Layer.java

The Util class is also re-usable code that verifies that an object is not empty.

Once the Util.isEmpty() method, as shown in listing 5, has been verified, tested and thoroughly completed, the Util class can be re-used as needed without much effort. By using re-usable code, the number of defects are greatly reduced. The need for mass code correction is also reduced and the time to market for the product is greatly increased. The less defects that are initially created, the fewer corrections are required. All of these OOP principles and programming practices provide numerous long-term sustainable benefits.

public class U t i l {

/∗∗ ∗ ∗ @param o b j ∗ @return ∗/ 43 public static boolean isEmpty(Object obj){ i f ( null == obj ) { return true ; } return false ; }

/∗∗ ∗ ∗ @param s ∗ @return ∗/ public static boolean isEmpty(String s){ i f ( null == s || 0 == s.length()){ return true ; } return false ; } } Listing 2.3: sourcecode/Util.java

Code abstraction reduces duplicated code by removing common components and creating re-usable classes or objects. Modifications to the abstracted code base are significantly made easier because the abstracted code has fewer touch points to modify in the code base. A defect in one section of code can easily be corrected and then the code base can be published. This approach is easily demonstrated in the Util class as each of the concrete implementations is done exactly once. By designing a single concrete abstract method, the likelihood of a defect in the code is reduced to a single touch point. This is not to say that defects do not happen, but does show how effectively a defect can be managed, corrected and published.

Modifications or changes to the code base are simplified as all the logic for each piece to the machine learning algorithm has been encapsulated into classes or objects.

The following section refer to Listing 7 and Listing 8. These function java code samples illustrate the encapsulation and inheritance principles of object orientated paradigm.

The Activation Function abstract class defines the behavior of all the various acti- vation functions, such as the Sigmoid, Softmax, Tanh, Linear and the Step functions.

44 The abstract class in simple terms is just a blueprint of how the actual concrete implementation will be defined and behave. public abstract class ActivationFunction implements Serializable{

private static final long serialVersionUID = 1L;

/∗∗ ∗ cached output value to avoid double calculation for derivative ∗ inherited by all sub−c l a s s e s ∗/ protected double output ;

/∗∗ ∗ Returns the output of this function. ∗ ∗ @param inputs ∗ total weighted input ∗/ abstract public double getOutput ( double input ) ;

/∗∗ ∗ Returns the first derivative of the ActivationFunction ∗ @param input ∗ total weighted input ∗/ abstract public double getDerivative( double input ) ; } Listing 2.4: sourcecode/ActivationFunction.java

The Sigmoid class is the actual concrete implemented behavior defined by the

Abstract class. The implementation will have all the logic to perform all necessary tasks related to the activation function. By using this design principle, we can quickly develop new implementations of all the various Activation Functions. The newly implemented Activation Functions can be added quickly and have minimal impact on the projects TTM. public class Sigmoid extends ActivationFunction implements Serializable {

private static final long serialVersionUID = 1L;

private static final Logger logger = LoggerFactory.getLogger(Sigmoid. class );

/∗∗ 45 ∗ Slope for the Sigmoidal curve ∗/ private double s l o p e = 1 . 0 ;

/∗∗ ∗ Create and instance of the Sigmoid Activation Class with the slope s e t to 1 ∗/ public Sigmoid ( ) {

}

/∗∗ ∗ Create and instance of the Sigmoid Activation Class with the slope being specified by the passed parameter ∗ @param slope for the Sigmoidal curve ∗/ public Sigmoid ( double s l o p e ) { this .slope = slope; }

/∗∗ ∗ @return slope of the Sigmoidal Activation Function ∗/ public double getSlope() { return s l o p e ; }

/∗∗ ∗ ∗ @param slope for the Sigmoid Activation Function ∗/ public void s e t S l o p e ( double s l o p e ) { this .slope = slope; }

/∗∗ ∗ {@inheritDoc} ∗/ @Override public double getOutput ( double weight ) {

logger.debug("getOutput: {}", new Object[]{weight});

double den = 1.0 + Math.exp(−this . s l o p e ∗ weight ) ; this .output = (1.0 / den); return this . output ; }

/∗∗ ∗ {@inheritDoc} ∗/ @Override

46 public double getDerivative( double net ) {

logger.debug("getOutput: {}", new Object [ ] { this . output }) ;

double derivative = this . s l o p e ∗ this . output ∗ (1d − this . output ) + 0 . 1 ; return derivative; } Listing 2.5: sourcecode/SigmoidFull.java

All the various OOP techniques used to engineer the machine learning API pro- vide maximum flexibility with reduced complexity and a faster time to market. By adhering to these principles, we can see the increased benefits for the OOP paradigm.

The algorithm for the Neural Network framework is illustrated with the following pseudo-code.

47 Algorithm : Test Neural Network Framework

1: procedure Run 2: dataList ← read input data from file 3: nl ← lower limit for classification function 4: nh ← upper limit for classification function 5: normailze: 6: for ds in dataList do 7: min ← ds 8: max ← ds (min−max)∗ds−(nh∗min)+nl∗max 9: result ← (nl−nh) 10: ds ← result 11: end for 12: tds ← dataList 13: learningRate ← 0 - 1 14: train the network learn(tds) 15: open output file 16: run: 17: for ds in dataList do 18: y ← runNetwork 2 19: mse ← (target - y) 20: write ds to output file 21: write mse to output file 22: end for 23: avgmse ← mse / dataList.size 24: write avgmse to output file 25: close output file 26: end procedure

48 Chapter 3

Results

3.1 Framework Setup

The identical framework setup was used to produce comparable results from each framework. A back-propagation network consisting of three layers: one input layer, one hidden layer and one output layer was chosen. The input layer consists of three neurons; the hidden layer consists of four neurons; while the output layer consists of one neuron. The sigmoid function was chosen as the activation function, so the data points were all normalized to values between [0,1].

Each network was randomly assigned sample data between 10% and 20% of the population using a cross validation method known as the "HoldOut" method. In addition, each network was assigned the following parameters: Maximum Iterations

100,000 and an Error percent which was varied between 3% and 6%. Each framework was executed multiple times with the various parameters and the Mean Squared

(target−actual)2 Error (MSE = datapoints ) was computed and compared after each execution. The following figures illustrate each framework with the various comparisons.

49 3.1.1 Learning Rate vs MSE

As shown in figure 3-1, all the frameworks have very similar slopes when analyzing the learning rate vs the Mean Squared Error. The data shows that our framework

(Deep ML framework) has a greater accuracy than the other two frameworks. The

Encog [54] and Neuroph [55] framework converge when the learning rate approaches

0.50 and generally keep trending downward but are still greater than the Deep ML’s slope. As the learning rates decrease, the Deep ML and Encog framework produce smaller MSE because backpropagation uses the learning rate to reduce the weights with respect to each connection at each node, kind of like placing blame. The learning rate can be thought of as a length of time moved in one direction as the slope is derived. When calculating the learn rate one must find a learning rate that does not overtrain or overfit the network and find a learning rate which allows the network to compute in a meaningful time. Overall, Figure 3-1, proves the Deep ML algorithm provided a slight improvement when computed against other algorithms.

Figure 3-1: Learning Rate vs MSE

50 3.1.2 Sample Size vs MSE

Figure 3-2 shows the relationship between the sample size and the MSE. All three of the frameworks provide a clear visualization that as the sample size increase the

MSE decrease. Sample sizes between 400 and 10000 were used with training data sets sizes of 10% and 20%. The Deep ML framework shows a clear advantage when compared to the other frameworks. Thus, the Deep ML framework provides more accurate and reliable results.

Figure 3-2: Sample Size vs MSE

3.1.3 Max Error vs MSE

Figure 3-3 shows the relationship between the maximum error and the mean square error for each framework. The maximum error is the acceptable error percentage at which point the network would be trained. Basically, training continues until the maximum error is reached. For this research the maximum error percent was kept between 1% and 6%.

51 The following pseudo-code illustrates how the maximum error was incorporated into the training of the networks.

3.1.4 Pseudo-code

Algorithm : Training with regard to Max Error

1: procedure Train Network 2: train: 3: while maxError < train.getError() do 4: train.iteration() 5: train.backP ropagate() 6: end while 7: end procedure

The supporting data shows that as the maximum error increases so does the

MSE and as the maximum error decreases so does the MSE. The data does not show a decreased convergence towards the limit at 1%, which supports the conclusion that "over training" was not a factor. Also, by comparing the MSE of the various frameworks, we can conclude that "over training" was not a contributing factor in this research. However, we can see from the data, that the Deep ML framework shows a significantly lower MSE than the other two frameworks.

The results shown below illustrate that the Deep ML framework provides an in- crease in accuracy and precision over Neuroph and Encog. The MSE in most test runs was lower that the counter-part frameworks, when compared to the learning rate, maximum error and sample size.

52 Figure 3-3: Maximum Error vs MSE

53 Chapter 4

Conclusion and Future Work

This thesis proves that with significant design and engineering, the software life cycle and object-oriented principles have a dramatic impact on the precision, effec- tiveness, and reliability of artificial neural networks. Adhering to Object-oriented programming principles greatly reduces the time to market (TTM) and the return on investment (ROI) as modifications, enhancements, and corrections are efficiently completed and safely integrated into OOP projects using any modern programming language tooling. The OOP principles of polymorphism, encapsulation, and inheri- tance are central to ANNs and are the cornerstone to the success in this endeavor.

In addition, it illustrates that state-of-the art approaches utilizing object oriented programming principles have a significant impact in the design of ANNs. This re- search also proves that back-propagation techniques using object oriented program- ming for artificial neural network increase the accuracy of the forecasting models.

Evidently, this research demonstrates that the three core features of OOP: inher- itance, polymorphism, and encapsulation should become a core component of any artificial neural network design exploiting the benefits of development using object oriented programming.

Moreover, we have discussed existing object oriented programming platforms and tools that have been used for ANNs. Most of the tools surveyed are based on various

54 technologies such as C++ or C# programming language. Through illustrations, we have used Java programming language to show how OOP can be used for ANN.

The thesis describes an Object-oriented programming approach to a supervised deep neural network using the Sigmoid classification function. Continuous analysis of the parameters and the predictions formed by these algorithms would greatly assist in achieving a more reliable prediction model.

First, more research could be done with the various cross-validations methodolo- gies to assist with proving the various algorithm parameters and correlation to the prediction model. Cross validations algorithms such as, "k-fold" or "leave-p out" methods could be engineered to assist with the parameter correlation to the predic- tion models. In addition, more research could be done using larger samples sizes. This research used a maximum sample size of 11,500 rows with 3 inputs and 1 output.

Another area of continued research could be with a paralleled processing of si- multaneous data partitions and the regeneration of the data after the processing has completed. Along with data partitioning with parallel processes, more individualized research on over-fitting and under-fitting data training signals is needed. This area needs improvements as currently the only good measure for over-fitting and under-

fitting a series of data signals is to use cross-validation and compute the MSE on the trained network. Once the neural network is trained and the data has been com- pletely processed, the best known methodology to validate for errors is to compare the training data MSE’s to detect any suspicious anomalies between the signals.

Along with the above topics, another area where advancements could be of interest is the continuous development of the OOP paradigm. Most current programming platforms are always advancing and introducing new functionality such modules and

Lambda functions. Modules and lambdas allow the Systems Engineer to optionally use certain components of a framework and incorporate another layer of abstraction.

Lastly, continuous research could be done with various classification functions.

55 The research in this thesis focused on the Sigmoid function and the various meth- ods used to maximize the results to prevent over-training, under-training and the decreased ability to learn as the function converges at the limits.

56 References

[1] Y. D. Liang, Introduction to Java Programming and Data Structures: Compre-

hensive Version. Pearson Education, 2017.

[2] R. C. Martin, Agile software development: principles, patterns, and practices.

Prentice Hall, 2002.

[3] Orcale, “Oracle Java Documentation controlling access to members of a class.”

https://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.

html. Accessed: 2018-03-30.

[4] J. Rogers, Object-oriented neural networks in C++. Morgan Kaufmann, 1997.

[5] M. Russo, “A distributed neuro-genetic programming tool,” Swarm and Evolu-

tionary Computation, vol. 27, pp. 145–155, 2016.

[6] M. A. Keyvanrad and M. M. Homayounpour, “A brief survey on deep belief net-

works and introducing a new object oriented toolbox (deebnet),” arXiv preprint

arXiv:1408.3264, 2014.

[7] G. Folino, C. Pizzuti, and G. Spezzano, “A scalable cellular implementation of

parallel genetic programming,” IEEE Transactions on Evolutionary Computa-

tion, vol. 7, no. 1, pp. 37–53, 2003.

[8] B. A. Høverstad, “Simdist: a distribution system for easy parallelization of evo-

lutionary computation,” Genetic Programming and Evolvable Machines, vol. 11,

no. 2, pp. 185–203, 2010.

57 [9] F. Moreno, J. Alarcón, R. Salvador, and T. Riesgo, “Reconfigurable hardware

architecture of a shape recognition system based on specialized tiny neural net-

works with online training,” IEEE Transactions on Industrial Electronics, vol. 56,

no. 8, pp. 3253–3263, 2009.

[10] D. Andre and J. R. Koza, “Parallel genetic programming on a network of trans-

puters,” in Proceedings of the Workshop on Genetic Programming: From Theory

to Real-World Applications, pp. 111–120, Tahoe City, California, USA, 1995.

[11] J. Diederich, “Explanation and artificial neural networks,” International Journal

of Man-Machine Studies, vol. 37, no. 3, pp. 335–355, 1992.

[12] L. Á. Menéndez, F. J. de Cos Juez, F. S. Lasheras, and J. Á. Riesgo, “Artificial

neural networks applied to cancer detection in a breast screening programme,”

Mathematical and Computer Modelling, vol. 52, no. 7-8, pp. 983–991, 2010.

[13] J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann,

F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, et al., “Classification

and diagnostic prediction of cancers using gene expression profiling and artificial

neural networks,” Nature medicine, vol. 7, no. 6, p. 673, 2001.

[14] D. C. Cireşan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Mitosis

detection in breast cancer histology images with deep neural networks,” in In-

ternational Conference on Medical Image Computing and Computer-assisted In-

tervention, pp. 411–418, Springer, 2013.

[15] C. E. Floyd, J. Y. Lo, A. J. Yun, D. C. Sullivan, and P. J. Kornguth, “Prediction

of breast cancer malignancy using an artificial neural network,” Cancer, vol. 74,

no. 11, pp. 2944–2948, 1994.

[16] M. Karabatak and M. C. Ince, “An expert system for detection of breast cancer

58 based on association rules and neural network,” Expert systems with Applications,

vol. 36, no. 2, pp. 3465–3469, 2009.

[17] S. Kaymak, A. Helwan, and D. Uzun, “Breast cancer image classification using

artificial neural networks,” Procedia Computer Science, vol. 120, pp. 126–131,

2017.

[18] H. Chougrad, H. Zouaki, and O. Alheyane, “Deep convolutional neural networks

for breast cancer screening,” Computer Methods and Programs in Biomedicine,

2018.

[19] P. B. Snow, D. S. Smith, and W. J. Catalona, “Artificial neural networks in

the diagnosis and prognosis of prostate cancer: a pilot study,” The Journal of

urology, vol. 152, no. 5, pp. 1923–1926, 1994.

[20] C. Stephan, H. Cammann, A. Semjonow, E. P. Diamandis, L. F. Wymenga,

M. Lein, P. Sinha, S. A. Loening, and K. Jung, “Multicenter evaluation of an

artificial neural network to increase the prostate cancer detection rate and reduce

unnecessary biopsies,” Clinical Chemistry, vol. 48, no. 8, pp. 1279–1287, 2002.

[21] M. A. Devi, S. Ravi, J. Vaishnavi, and S. Punitha, “Classification of cervical

cancer using artificial neural networks,” Procedia Computer Science, vol. 89,

pp. 465–472, 2016.

[22] O. Charron, A. Lallement, D. Jarnet, V. Noblet, J.-B. Clavier, and P. Meyer,

“Automatic detection and segmentation of brain metastases on multimodal mr

images with a deep convolutional neural network,” Computers in biology and

medicine, vol. 95, pp. 43–54, 2018.

[23] Z.-H. Zhou, Y. Jiang, Y.-B. Yang, and S.-F. Chen, “Lung cancer cell identification

based on artificial neural network ensembles,” Artificial Intelligence in Medicine,

vol. 24, no. 1, pp. 25–36, 2002. 59 [24] D. Zafeiris, S. Rutella, and G. R. Ball, “An artificial neural network integrated

pipeline for biomarker discovery using alzheimer’s disease as a case study,” Com-

putational and Structural Biotechnology Journal, vol. 16, pp. 77–87, 2018.

[25] S.-L. Hung and H. Adeli, “Object-oriented backpropagation and its application

to structural design,” Neurocomputing, vol. 6, no. 1, pp. 45–55, 1994.

[26] M. M. T. Thwin and T.-S. Quah, “Application of neural networks for software

quality prediction using object-oriented metrics,” Journal of systems and soft-

ware, vol. 76, no. 2, pp. 147–156, 2005.

[27] G. Valentini and F. Masulli, “Neurobjects: an object-oriented library for neural

network development,” Neurocomputing, vol. 48, no. 1-4, pp. 623–646, 2002.

[28] W. Wang, Y. Murphey, and P. Watta, “A computational framework for imple-

mentation of neural networks on multi-core machine,” Procedia Computer Sci-

ence, vol. 53, pp. 82–91, 2015.

[29] J. Zheng, “Cost-sensitive boosting neural networks for software defect predic-

tion,” Expert Systems with Applications, vol. 37, no. 6, pp. 4537–4543, 2010.

[30] V. Rao and H. V. Rao, C++ neural networks and fuzzy logic. Mis: Press, 1995.

[31] S. T. Welstead, Neural Network and Fuzzy Logic Applications in C-C++. John

Wiley & Sons, Inc., 1994.

[32] T. Masters, Practical neural network recipes in C++. Morgan Kaufmann, 1993.

[33] O. Deperlioglu and U. Kose, “An educational tool for artificial neural networks,”

Computers & Electrical Engineering, vol. 37, no. 3, pp. 392–402, 2011.

[34] A. Azadeh, Z. Faiz, S. Asadzadeh, and R. Tavakkoli-Moghaddam, “An integrated

artificial neural network-computer simulation for optimization of complex tan-

60 dem queue systems,” Mathematics and Computers in Simulation, vol. 82, no. 4,

pp. 666–678, 2011.

[35] X. Li, C. P. Jobling, and P. W. Grant, “An object-oriented information model for

intelligent modelling,” IFAC Proceedings Volumes, vol. 29, no. 1, pp. 4410–4415,

1996.

[36] H. R. Myler, A. R. Weeks, R. K. Gillis, and G. W. Hall, “Object-oriented neural

simulation tools for a hypercube parallel machine,” Neurocomputing, vol. 4, no. 5,

pp. 235–248, 1992.

[37] B. Tunç, “Semantics of object representation in machine learning,” Pattern Recog-

nition Letters, vol. 64, pp. 30–36, 2015.

[38] E. A. M. L. Abdrabou and A.-B. M. Salem, “A breast cancer classifier based on

a combination of case-based reasoning and ontology approach,” in Computer sci-

ence and information technology (imcsit), proceedings of the 2010 international

multiconference on, pp. 3–10, IEEE, 2010.

[39] A. B. Tosun, M. Kandemir, C. Sokmensuer, and C. Gunduz-Demir, “Object-

oriented texture analysis for the unsupervised segmentation of biopsy images for

cancer detection,” Pattern Recognition, vol. 42, no. 6, pp. 1104–1112, 2009.

[40] L. P. Zhao and H. Bolouri, “Object-oriented regression for building predictive

models with high dimensional omics data from translational studies,” Journal of

biomedical informatics, vol. 60, pp. 431–445, 2016.

[41] J. A. Cruz and D. S. Wishart, “Applications of machine learning in cancer pre-

diction and prognosis,” Cancer informatics, vol. 2, p. 59, 2006.

[42] G. Bartsch, A. P. Mitra, S. A. Mitra, A. A. Almal, K. E. Steven, D. G. Skinner,

D. W. Fry, P. F. Lenehan, W. P. Worzel, and R. J. Cote, “Use of artificial

61 intelligence and machine learning algorithms with gene expression profiling to

predict recurrent nonmuscle invasive urothelial carcinoma of the bladder,” The

Journal of urology, vol. 195, no. 2, pp. 493–498, 2016.

[43] B. Zhang, X. He, F. Ouyang, D. Gu, Y. Dong, L. Zhang, X. Mo, W. Huang,

J. Tian, and S. Zhang, “Radiomic machine-learning classifiers for prognostic

biomarkers of advanced nasopharyngeal carcinoma,” Cancer Letters, 2017.

[44] Y. Yu and Z. Sun, “Sparse coding extreme learning machine for classification,”

Neurocomputing, 2017.

[45] N. H. N. Moorthy, S. Kumar, and V. Poongavanam, “Classification of carcino-

genic and mutagenic properties using machine learning method,” Computational

Toxicology, vol. 3, pp. 33–43, 2017.

[46] J. R. Imbus, R. W. Randle, S. C. Pitt, R. S. Sippel, and D. F. Schneider, “Machine

learning to identify multigland disease in primary hyperparathyroidism,” Journal

of Surgical Research, vol. 219, pp. 173–179, 2017.

[47] Y. Kawata, H. Arimura, K. Ikushima, Z. Jin, K. Morita, C. Tokunaga, H. Yabu-

uchi, Y. Shioyama, T. Sasaki, H. Honda, et al., “Impact of pixel-based machine-

learning techniques on automated frameworks for delineation of gross tumor

volume regions for stereotactic body radiation therapy,” Physica Medica, vol. 42,

pp. 141–149, 2017.

[48] M. Maniruzzaman, N. Kumar, M. M. Abedin, M. S. Islam, H. S. Suri, A. S.

El-Baz, and J. S. Suri, “Comparative approaches for classification of diabetes

mellitus data: Machine learning paradigm,” Computer Methods and Programs in

Biomedicine, 2017.

[49] J. Xia, H. Chen, Q. Li, M. Zhou, L. Chen, Z. Cai, Y. Fang, and H. Zhou,

“Ultrasound-based differentiation of malignant and benign thyroid nodules: 62 An extreme learning machine approach,” Computer Methods and Programs in

Biomedicine, vol. 147, pp. 37–49, 2017.

[50] M. Sattlecker, N. Stone, and C. Bessant, “Current trends in machine-learning

methods applied to spectroscopic cancer diagnosis,” TrAC Trends in Analytical

Chemistry, vol. 59, pp. 17–25, 2014.

[51] M. A. Mohammed, M. K. A. Ghani, R. I. Hamed, D. A. Ibrahim, and M. K. Ab-

dullah, “Artificial neural networks for automatic segmentation and identification

of nasopharyngeal carcinoma,” Journal of Computational Science, 2017.

[52] H.-H. Rau, C.-Y. Hsu, Y.-A. Lin, S. Atique, A. Fuad, L.-M. Wei, and M.-H. Hsu,

“Development of a web-based liver cancer prediction model for type ii diabetes

patients by using an artificial neural network,” Computer methods and programs

in biomedicine, vol. 125, pp. 58–65, 2016.

[53] I. S. Alex Krizhevsky and G. E. Hinton, “Imagenet classification with deep

convolutional neural networks,” http://www.cs.toronto.edu/~fritz/absps/

imagenet.pdf.

[54] H. Research, “Machine learning framework.”

[55] J. N. N. Framwork, “Machine learning framework.”

63 Appendix A

Artificial Neural Network Java Code

The main working class which handles all the calls and execution of the network computations. import ActivationFunction .Sigmoid; import NeuralNet .Network; import Utils.DisplayUtil; import Utils .NeuronUtil; public class Runnable {

private static final int INPUT_NEURON_COUNT = 2 ; private static final int HIDDEN_NEURON_COUNT = 3 ; private static final int OUTPUT_NEURON_COUNT = 1 ;

private static double [][] getWeightChanges( double [ ] [ ] arry , double rateOfChange) { double wtChanges[][] = new double [ arry . length ] [HIDDEN_NEURON_COUNT] ; for ( int i=0; i

public static double [] computeNodeValue( double [][] weights, double inputs [ ] ) { double sum [ ] = new double [HIDDEN_NEURON_COUNT] ; for ( int i =0; i

64 /∗∗ ∗ ∗ @param inputTotHiddenWeights ∗ @param changes ∗ @return ∗/ private static double [][] updateInputToHiddenWeights( double [][] inputTotHiddenWeights , double [] changes) { int k=0; for ( int i=0;i

double inputs[] = {1,0}; double outputs[] = {0.0}; double inputTotHiddenWeights[][] = { {.8, .4, .3}, {.2,.9,.5} }; double hiddenToOuterWeight[][] = {{.3, .5, .9}}; double lasthiddenToOuterWeight[][] = {{.3, .5, .9}};

Sigmoid sigmoid = new Sigmoid ( ) ;

double hiddenLayerSums [] = computeNodeValue(inputTotHiddenWeights , inputs ) ; double hiddenLayerFofX [] = sigmoid.computeSumofSigmoid( hiddenLayerSums) ; System.out.println("HiddenLayer Sum: " + DisplayUtil.displayDouble( hiddenLayerSums)) ; System.out.println("HiddenLayer f(x): " + DisplayUtil.displayDouble( hiddenLayerFofX)) ;

double outputSum = 0.0; //computeNodeValue(weights , hiddenLayerFofX); for ( int x=0; x

/∗ Target − c a l c u l a t e d ∗/ double ouputSumMarginOfError = outputs [OUTPUT_NEURON_COUNT−1] − a c t u a l ; System.out.printf("Output Sum Margn of Error: %.6f\n", ouputSumMarginOfError) ;

65 /∗ ∗ Delta output sum = S’(sum) ∗ (output sum margin of error) ∗ Also known as the rate of change ∗ ∗/ double deltaOutputSum = sigmoid. getDerivative(outputSum) ∗ ouputSumMarginOfError ; System.out. printf("Delta Output Sum: %.6f\n", deltaOutputSum);

/∗ ∗ Delta output sum = S’(sum) ∗ (output sum margin of error) ∗ Delta output sum = S’(1.235) ∗ ( −0.77) ∗ Delta output sum = −0.13439890643886018 ∗ https://stevenmiller888.github.io/mind−how−to−b u i l d −a−neural− network / ∗/ double changes [] = NeuronUtil.getWeightChanges(hiddenLayerFofX , deltaOutputSum) ; System.out.println("Weight Changes: " + DisplayUtil.displayDouble( changes ) ) ;

System.out.print("Old Weights: " + DisplayUtil.displayDouble( hiddenToOuterWeight)) ; NeuronUtil.updatedWeights(changes , hiddenToOuterWeight) ; System.out.println(", New Weights: " + DisplayUtil.displayDouble( hiddenToOuterWeight)) ;

/∗ ∗ Delta hidden sum = delta output sum / hidden−to−outer weights ∗ S ’(hidden sum) ∗ Delta hidden sum = −0.1344 / [0.3, 0.5, 0.9] ∗ S’([1, 1.3, 0.8]) ∗ Delta hidden sum = [ −0.448 , −0.2688 , −0.1493] ∗ [ 0 . 1 9 6 6 , 0.1683, 0.2139] ∗ Delta hidden sum = [ −0.088 , −0.0452 , −0.0319] ∗/ double rateChanges [][] = getWeightChanges(lasthiddenToOuterWeight , deltaOutputSum) ; System.out. println("Hidden−To−Outer Weight Changes: " + DisplayUtil. displayDouble(rateChanges)); System.out.print("f(x) Weight: " + DisplayUtil.displayDouble( hiddenLayerSums)) ; double diretiveOfHiddenLaySums [] = sigmoid.getSumOfDiretives( hiddenLayerSums) ; System.out.println(", Diretives of Input−To−Hidden Sum: " + DisplayUtil .displayDouble(diretiveOfHiddenLaySums)); changes = NeuronUtil.computeChanges(diretiveOfHiddenLaySums , rateChanges); System.out.println("Weight Changes: " + DisplayUtil.displayDouble( changes ) ) ;

/∗

66 ∗ input 1 = 1 ∗ input 2 = 1

∗ Delta weights = delta hidden sum / input data ∗ Delta weights = [ −0.088 , −0.0452 , −0.0319] / [1, 1] ∗ Delta weights = [ −0.088 , −0.0452 , −0.0319 , −0.088 , −0.0452 , −0.0319] ∗/ changes = NeuronUtil.getInputChanges(inputs , changes); System.out.println("Input Weight Changes: " + DisplayUtil. displayDouble(changes));

System.out.println("Old Input Weight Changes: " + DisplayUtil. displayDouble(inputTotHiddenWeights)) ; inputTotHiddenWeights = updateInputToHiddenWeights( inputTotHiddenWeights , changes); System.out.println("New Input Weight Changes: " + DisplayUtil. displayDouble(inputTotHiddenWeights)) ; } } Listing A.1: sourcecode/AnnMathMain.java

67 Appendix B

Activation Function Interface and

Implementation

The java interface for the activation functions and an implementation of the sig- moid activation function. import ActivationFunction .Sigmoid; import NeuralNet .Network; import Utils.DisplayUtil; import Utils .NeuronUtil; public class Runnable {

private static final int INPUT_NEURON_COUNT = 2 ; private static final int HIDDEN_NEURON_COUNT = 3 ; private static final int OUTPUT_NEURON_COUNT = 1 ;

private static double [][] getWeightChanges( double [ ] [ ] arry , double rateOfChange) { double wtChanges[][] = new double [ arry . length ] [HIDDEN_NEURON_COUNT] ; for ( int i=0; i

public static double [] computeNodeValue( double [][] weights, double inputs [ ] ) { double sum [ ] = new double [HIDDEN_NEURON_COUNT] ; for ( int i =0; i

/∗∗ ∗ ∗ @param inputTotHiddenWeights ∗ @param changes ∗ @return ∗/ private static double [][] updateInputToHiddenWeights( double [][] inputTotHiddenWeights , double [] changes) { int k=0; for ( int i=0;i

double inputs[] = {1,0}; double outputs[] = {0.0}; double inputTotHiddenWeights[][] = { {.8, .4, .3}, {.2,.9,.5} }; double hiddenToOuterWeight[][] = {{.3, .5, .9}}; double lasthiddenToOuterWeight[][] = {{.3, .5, .9}};

Sigmoid sigmoid = new Sigmoid ( ) ;

double hiddenLayerSums [] = computeNodeValue(inputTotHiddenWeights , inputs ) ; double hiddenLayerFofX [] = sigmoid.computeSumofSigmoid( hiddenLayerSums) ; System.out.println("HiddenLayer Sum: " + DisplayUtil.displayDouble( hiddenLayerSums)) ; System.out.println("HiddenLayer f(x): " + DisplayUtil.displayDouble( hiddenLayerFofX)) ;

double outputSum = 0.0; //computeNodeValue(weights , hiddenLayerFofX); for ( int x=0; x

/∗ Target − c a l c u l a t e d ∗/

69 double ouputSumMarginOfError = outputs [OUTPUT_NEURON_COUNT−1] − a c t u a l ; System.out.printf("Output Sum Margn of Error: %.6f\n", ouputSumMarginOfError) ;

/∗ ∗ Delta output sum = S’(sum) ∗ (output sum margin of error) ∗ Also known as the rate of change ∗ ∗/ double deltaOutputSum = sigmoid. getDerivative(outputSum) ∗ ouputSumMarginOfError ; System.out. printf("Delta Output Sum: %.6f\n", deltaOutputSum);

/∗ ∗ Delta output sum = S’(sum) ∗ (output sum margin of error) ∗ Delta output sum = S’(1.235) ∗ ( −0.77) ∗ Delta output sum = −0.13439890643886018 ∗ https://stevenmiller888.github.io/mind−how−to−b u i l d −a−neural− network / ∗/ double changes [] = NeuronUtil.getWeightChanges(hiddenLayerFofX , deltaOutputSum) ; System.out.println("Weight Changes: " + DisplayUtil.displayDouble( changes ) ) ;

System.out.print("Old Weights: " + DisplayUtil.displayDouble( hiddenToOuterWeight)) ; NeuronUtil.updatedWeights(changes , hiddenToOuterWeight) ; System.out.println(", New Weights: " + DisplayUtil.displayDouble( hiddenToOuterWeight)) ;

/∗ ∗ Delta hidden sum = delta output sum / hidden−to−outer weights ∗ S ’(hidden sum) ∗ Delta hidden sum = −0.1344 / [0.3, 0.5, 0.9] ∗ S’([1, 1.3, 0.8]) ∗ Delta hidden sum = [ −0.448 , −0.2688 , −0.1493] ∗ [ 0 . 1 9 6 6 , 0.1683, 0.2139] ∗ Delta hidden sum = [ −0.088 , −0.0452 , −0.0319] ∗/ double rateChanges [][] = getWeightChanges(lasthiddenToOuterWeight , deltaOutputSum) ; System.out. println("Hidden−To−Outer Weight Changes: " + DisplayUtil. displayDouble(rateChanges)); System.out.print("f(x) Weight: " + DisplayUtil.displayDouble( hiddenLayerSums)) ; double diretiveOfHiddenLaySums [] = sigmoid.getSumOfDiretives( hiddenLayerSums) ; System.out.println(", Diretives of Input−To−Hidden Sum: " + DisplayUtil .displayDouble(diretiveOfHiddenLaySums));

70 changes = NeuronUtil.computeChanges(diretiveOfHiddenLaySums , rateChanges); System.out.println("Weight Changes: " + DisplayUtil.displayDouble( changes ) ) ;

/∗ ∗ input 1 = 1 ∗ input 2 = 1

∗ Delta weights = delta hidden sum / input data ∗ Delta weights = [ −0.088 , −0.0452 , −0.0319] / [1, 1] ∗ Delta weights = [ −0.088 , −0.0452 , −0.0319 , −0.088 , −0.0452 , −0.0319] ∗/ changes = NeuronUtil.getInputChanges(inputs , changes); System.out.println("Input Weight Changes: " + DisplayUtil. displayDouble(changes));

System.out.println("Old Input Weight Changes: " + DisplayUtil. displayDouble(inputTotHiddenWeights)) ; inputTotHiddenWeights = updateInputToHiddenWeights( inputTotHiddenWeights , changes); System.out.println("New Input Weight Changes: " + DisplayUtil. displayDouble(inputTotHiddenWeights)) ; } } Listing B.1: sourcecode/AnnMathMain.java package ActivationFunction ;

public interface ActivationFunction {

public double getOutput ( double input ) ;

public double getDerivative( double output ) ; } Listing B.2: sourcecode/ActivationFunctionInterface.java package ActivationFunction ; public class Sigmoid implements ActivationFunction{

private static final double SLOPE = 1 . 0 0 ;

public Sigmoid ( ) {

}

/∗∗ ∗

71 ∗ @param arry ∗ @return ∗/ public double [] getSumOfDiretives( double [ ] arry ) { double d [ ] = new double [arry.length]; for ( int i=0;i

/∗∗ ∗ Sigmoid function ∗ @param arry ∗ @return ∗/ public double [] computeSumofSigmoid( double [ ] arry ) { double sum [ ] = new double [arry.length]; for ( int i=0;i

@Override public double getOutput ( double input ) { // conditional logic helps to avoid NaN i f (input > 100) { return 1 . 0 ; } else i f ( input < −100) { return 0 . 0 ; } return (1 / (1 + Math.pow(Math.E, (−1 ∗ input ) ) ) ) ; }

@Override public double getDerivative( double output ) { // +0.1 is fix for flat spot see http://www.heatonresearch.com/wiki/ Flat_Spot return getOutput(output) ∗ (1 − getOutput(output)); } } Listing B.3: sourcecode/SigmoidImpl.java

72 Appendix C

Neuron Utility Class

The neuron utilities class which handles all the connection weight updates and computing the sum of the connected weights. package U t i l s ; public class NeuronUtil { /∗∗ ∗ ∗ @param inputs ∗ @param changes ∗ @return ∗/ public static double [] getInputChanges( double [ ] inputs , double [] changes ) { double r e s u l t s [ ] = new double [inputs.length ∗ changes.length ]; int k=0; for ( int i=0;i

/∗∗ ∗ ∗ @param deltaOutputSum ∗ @param outputWeights ∗/ public static void updatedWeights( double [ ] changes , double [][] outputWeights) { for ( int i=0;i

73 outputWeights[i ][x] = outputWeights[i ][x] + changes[x]; } } }

/∗∗ ∗ Compute and Updates the change to the output layer’s weights ∗ @param hiddenLayerFofX ∗ @param rateOfChange ∗ @return ∗/ public static double [] getWeightChanges( double [] hiddenLayerFofX , double rateOfChange) { double wtChanges[] = new double [hiddenLayerFofX. length ]; for ( int x = 0; x < hiddenLayerFofX.length; x++) { wtChanges[x] = rateOfChange / hiddenLayerFofX[x]; } return wtChanges ; }

/∗∗ ∗ ∗ @param diretiveOfHiddenLaySums ∗ @param rateChanges ∗ @return ∗/ public static double [] computeChanges( double [] diretiveOfHiddenLaySums , double [][] rateChanges) { double product [ ] = new double [diretiveOfHiddenLaySums. length ];

for ( int i =0;i

74 Appendix D

Display Utility Class

A display utilities class to display the results package U t i l s ; public class DisplayUtil {

private static final String FORMAT = "%.6f";

/∗∗ ∗ ∗ @param outputWeights ∗ @return ∗/ public static String displayDouble( double [ ] arry ) { StringBuilder sb = new StringBuilder(); sb.append("{"); for ( int x=0; x0) {sb.append(", ");} sb.append(String .format(FORMAT, arry [x]) ); } sb.append("}"); return sb.toString(); }

/∗∗ ∗ ∗ @param arry ∗ @return ∗/ public static String displayDouble( double [ ] [ ] arry ) { StringBuilder sb = new StringBuilder(); sb.append("{{"); for ( int i=0;i0) {sb.append(", {");} for ( int x=0; x0) {sb.append(", ");}

75 sb.append(String .format(FORMAT, arry[ i ][x])); } sb.append("}"); } sb.append("}"); return sb.toString(); } } Listing D.1: sourcecode/DisplayUtil.java

76