Deep Learning Tech Report

Deep Learning Tech Report By: Hardik Tibrewal(18EC10020), Tanay Raghavendra(18EC10063), Ayan Chakraborty(18EC10075), Debjoy Saha(18EC30010) What is a neural network? The term, Deep Learning, refers to training Neural Networks, sometimes very large Neural Networks. So what exactly is a Neural Network? To get an intuition of this we can consider a Housing Price Prediction example. We have a data set with six houses, and we know the size of the houses in square feet and we know the price of the house and we want to fit a function to predict the price of the houses, the function of the size. We can easily fit a straight line to the points of the data set by using an error metric and minimising it. We then cut this straight line off at the point where the price becomes negative, as the price can never be negative. Hence, finally, we have a linear predictor of housing prices depending on their sizes. However, because we fit the zero at the point where the price becomes negative, the model is not linear. Hence we can think of this function that we've just fit the housing prices as a very simple neural network. As depicted in the figure below, we have as the input to the neural network the size of a house which one we call ‘x’. It goes into this node, and then it outputs the price which we call ‘y’. The node is a single neuron in a neural network that implements the predictor function. Mathematically, what this neuron does is it inputs the size, computes this linear function, takes a max of zero, and then outputs the estimated price. This is similar to a ReLU function which stands for rectified linear units. A larger neural network is formed by taking many similar single neurons and stacking them together. Let’s see an example of multiple neurons neural networks. Now we have multiple features to predict the housing price. Let these new features be size, number of bedrooms, zip code of the area, the wealth of area. Now as depicted in figure 3 we can form multiple ReLU or some other functions’ nodes. These ReLU nodes can finally combine using some weights to a final node that would then predict y. As you can see, by stacking together a few of the single neurons or the simple predictors we now have a slightly larger neural network. It is easy to manage a neural network because we only need to give the input x and it output y for the training set. Everything else the network figures out itself, giving us the best model formed from the defined layers in the process. Each of the nodes discussed previously are called hidden units in the neural network and each of them takes its inputs from all four input features. The remarkable thing about neural networks is that, given enough data about x and y, given enough training examples with both x and y, neural networks are remarkably good at figuring out functions that accurately map from x to y. Why is Deep Learning taking off now? The modules for deep learning and neural networks have existed for a long time. The reason that deep learning has become so popular in the recent past is that as society has digitized, more and more data has come in to solve the same problems that existed before. As we can see in the approximate graph below, deep learning models do not plateau as more data is fed in the model, or more accurately they plateau slower. This means that we can reach higher levels of accuracy with more data in a deep learning model. Hence because of the digitization of a society where so much of our activity is now in the digital realm, we spend so much time on the computers on websites on mobile apps and activities on digital devices creates data and thanks to the rise of inexpensive cameras built into our cell phones accelerometers all sorts of sensors in the Internet of Things we have been collecting more and more data The data is too large for traditional learning algorithms to be able to effectively. However, if we want to hit this very high level of performance then we need two things, first, we need the ability to train a big enough neural network to take advantage of the huge amount of data and second is to make sure we are collecting huge amounts of useful data. Modern-Day Applications of Deep Learning Some applications of Deep Learning include : 1. Self Driving Cars: DL is the driver behind autonomous driving. 2. Natural Language Processing: NLP applications primarily use DL and have wide applications, most prominently in Fake News/Hate Speech Detection. 3. Visual Recognition: DL is used to form visual recognition which has helped law enforcement bodies catch criminals quicker. It has several other applications. 4. Vast Improvement of already existing supervised models: Apart from these things mentioned above, DL has helped us improve the performance of day-to-day prediction models to help in real-life problems. Some Basic Neural Network Concepts Here are some standard notations used in deep learning. 1. Binary Classification: A classification with only two classes. Usually represented using the two binary digits. (1 - is a cat, 0 is not a cat) 2. Binary Step Function : 푓(푥) = {1 푖푓 푥 >= 0, 0 푖푓 푥 < 0} 3. Linear Function : 푓(푥) = 푎푥 1 4. Sigmoid Function : σ(푧) = −푧 1+푒 2푥 1−푒 5. Tanh Function : 푡푎푛ℎ(푥) = 2푧 1+푒 6. ReLU : 푓(푥) = 푚푎푥(0, 푥) 7. Leaky ReLU : 푓(푥) = {푥 푖푓 푥 >= 0, 0. 1푥 푖푓 푥 < 0} 8. Parameterized ReLU :푓(푥) = {푥 푖푓 푥 >= 0, 푎푥 푖푓 푥 < 0} 푥 9. Exponential Linear Unit : 푓(푥) = {푥 푖푓 푥 >= 0, 푎(푒 − 1) 푖푓 푥 < 0} 푥 10. Swish Function : 푓(푥) = −푥 1+푒 11. Logistic Regression: Logistic regression is a learning algorithm used in a supervised learning problem when the output 푦 are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data. 푇 12. Logistic Regression Model: 푦 = σ(푤 푥 + 푏) where σis the sigmoid function. 푚 (푖) (푖) 1 (푖) (푖) 13. Logistic Regression Cost: 퐽(푤, 푏) =− 푚 ∑ [푦 푙표푔(푦 ) + (1 − 푦 )푙표푔(1 − 푦 )] (we iterate on w,b 푖=1 to optimise this.) 푑(퐽(푤,푏)) 푑(퐽(푤,푏) 14. Gradient Descent : 푤 : = 푤 − α 푑(푤) ; 푏 : = 푏 − α 푑(푏) note: the derivatives are partial derivatives Deep Neural Networks - The Model of Deep Learning Motivation for choosing deep neural networks over shallow networks: For deep learning, artificial neural networks are used to learn the desired function. An artificial neural network consists of multiple layers of the individual computational structure called a perceptron. Multiple perceptrons together create a neural network layer, which can be used to create a neural network. The structure of the resultant graph, made by stacking perceptrons into layers and joining them, resembled the neurons of the brain, giving rise to the name neural network. Traditional neural networks only have a few layers, but deep neural networks can have 100 or even more layers. Single Perceptron → Basic Neural Network → Deep Neural Network Previous work in machine learning showed that a single linear perceptron cannot be a universal classifier, but a network with a non-polynomial activation function with a single layer of unbounded width can be a universal classifier. However, the unbounded width poses a problem since unbounded widths may result in matrices that may be too large to handle during computation. Deep learning is a modern variation that serves as a practical approach to this issue, where the neural network can have a large number of layers (or a “deep” structure), and each layer can have a bounded width. This allows for practical application to the issue and learning of more complex functions, while still serving as a universal classifier in mild conditions. In the time soon after the discovery of neural networks, deep neural networks were not feasible to implement since they required a large amount of data and processing power. The reason for requiring excessive computing power is obvious, the large number of matrix and transcendental operations that need to be carried out for each iteration of the process. The reason why a lot of data is required is related to overfitting. The double-edged sword offered by deep neural networks is that they are quite capable of identifying highly complex functions but this makes them prone to overfitting the training data. One of the approaches of reducing overfitting is to simply provide more data, thus forcing the learned function to generalise better, since a greater number of points from the distribution will need to be fit. Modern GPUs allow for parallel computation and provide a large number of computation units which makes the process of training faster, whereas the mass adoption of the Internet and advanced techniques allow for large scale data-mining. Benefits of Deep Neural Networks and their Explainability: As mentioned previously, deep neural networks were built upon the shallow artificial neural networks used in traditional machine learning, which were themselves built upon the concept of a perceptron, which forms a single “neuron” of our neural network. A single perceptron takes an input, uses its characteristic weights to perform a vector-vector dot product (or matrix-vector product in case of inputs in a batch) and then uses an activation function for classification.

Deep Learning Tech Report

Dynamic Behavior of Constained Back-Propagation Networks

ARCHITECTS of INTELLIGENCE for Xiaoxiao, Elaine, Colin, and Tristan ARCHITECTS of INTELLIGENCE

Cogsci Little Red Book.Indd

Introspection:Accelerating Neural Network Training by Learning Weight Evolution

Backpropagation: 1Th E Basic Theory

Voice Identity Finder Using the Back Propagation Algorithm of An

A Timeline of Artificial Intelligence

Boltzmann Machines, Backpropagation, Bayesian Networks

Cognitive Technologies: Mapping the Internet Governance Debate by Goran S

Connectionism and Language Acquisition

The Philosophy of Artificial Intelligence, Comprehensive Definition, She Embraces the Definitional Extremes, Literature, This Volume Provides a Margaret A

Statistics and Machine Learning Experiments in English and Romanian Poetry