Training DNN with Keras

APPENDIX A Training DNN with Keras This appendix will discuss using the Keras framework to train deep learning and explore some example applications on image segmentation using a fully convolutional network (FCN) and click-rate prediction with a wide and deep model (inspired by the TensorFlow implementation). Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance; see https://blog.acolyer.org/2017/05/11/ understanding-deep-learning-requires-re-thinking- generalization/. In a blog post (https://beamandrew.github.io/ deeplearning/2017/06/04/deep_learning_works.html), Andrew Beam explains why it’s possible to apply very large neural networks even if you have small data sets without the risk of overfitting. A.1 The Keras Framework Keras.io is an excellent framework to start deploying a deep learning model. The author, Francois Chollet, has created a great library, following a minimalist approach and with many hyperparameters and optimizers already preconfigured. You can run complex models in less than ten lines of code using Theano, TensorFlow, and CNTK backends. © Armando Vieira, Bernardete Ribeiro 2018 287 A. Vieira and B. Ribeiro, Introduction to Deep Learning Business Applications for Developers, https://doi.org/10.1007/978-1-4842-3453-2 APPENDIX A TRaininG DNN With KeRaS A.1.1 Installing Keras in Linux Keras is pretty straightforward to install. The first step is to install Theano or TensorFlow. Installing TensorFlow is easy with Pip. Be careful with the version you install, though. If you use a GPU, you have to choose a compatible installation that will run Cuda. There are some obvious dependencies like Numpy or less obvious ones like hdf5 to compress files. See the full instructions for a Linux installation at www.pyimagesearch. com/2016/11/14/installing-keras-with-tensorflow-backend/. A.1.2 Model Models in Keras are defined as a sequence of layers. A network is a stack of layers forming a network topology. The input layer needs to have the same dimensions as the input data. This can be specified when creating the first layer with the input_dim argument. Finding the best network architecture (number of layers, size of layers, activation functions) is done mostly by trial and error. Generally, you need a network large enough to accommodate the complexity of the problem but one that is not too complex. Fully connected layers are defined using the Dense class. You can specify the number of neurons in the layer as the first argument. The network weights should be initialized to a small random number generated from a uniform distribution. The initialization method can be specified as an int argument. The activation function is also specified as an argument. If you are unsure about these initializations, simply use the defaults. 288 APPENDIX A TRaininG DNN With KeRaS A.1.3 The Core Layers A neural network is composed of a set of (mostly sequential) layers that are connected with each other. These are the most common layers: • Input • Dense • Convolution1D and convolution2D • Embedding • LSTM A neural network works with tensors. Before you perform computation, you need to convert your data (as a Numpy array of a Pandas data frame) into a tensor. The input layer is the entry point of a neural network. The dense layer is the most basic (and common) type of layer. It has as arguments the number of unities and the activation function. The rectifier linear unit (ReLU) activation function is the most common one. The convolution layers (1D or 2D) are mostly used for text and images and the required parameters are the number of filters and the kernel size. The embedding layer is very useful for text data as they can convert a very high dimensional data into a denser representation - they require two parameters input_dim and output_dim. The LSTM layer is very useful to learn temporal or sequential data - the only required parameter is the number of units - careful since these networks with these layers are very computational intensive and they overfit easily. Some other common activation functions are tanh, softmax, and argmax. The following is a simple example of a Keras model to classify data (the response variable is the last column of the file xxx.csv, either 0 or 1). In this example, you will train a classifier, minimize the cross entropy over 150 epochs, and print the predictions. The data is assumed to be normalized. As the activation function in the last layer, you are using sigmoid, but 289 APPENDIX A TRaininG DNN With KeRaS normally softmax should be used. It is assumed that input data is contained in the initial X_dim columns - parameter that should be provided. from keras.models import Sequential from keras.layers import Dense import numpy as np # load a dataset dataset = np.loadtxt("xxx.csv", delimiter=",") # split into input (X) and output (Y) variables X = dataset[:,0:X_dim] Y = dataset[:,X_dim] # create model model = Sequential() model.add(Dense(12, input_dim=X_dim, init='uniform', activation='relu')) model.add(Dense(5, init='uniform', activation='relu')) model.add(Dense(1, init='uniform', activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model model.fit(X, Y, epochs=150, batch_size=10, verbose=2) # calculate predictions predictions = model.predict(X) # round predictions rounded = [round(x[0]) for x in predictions] print(rounded) 290 APPENDIX A TRaininG DNN With KeRaS A.1.4 The Loss Function Keras comes with the most common loss functions, including these basic ones: • Cross entropy and binary cross entropy for classification problems • Categorical cross entropy • Mean Square Error (MSE) for regression problems Building a personalized loss function is quite straightforward. An example is provided in the code of the FCN later in this chapter to weight the cross entropy to account for imbalanced categorical data, using the binary_crossentropy_2d_w() function. Care should be taken because loss functions have to be fully differentiable. For instance, you cannot use if, then, else. A.1.5 Training and Testing Normally you specify the metrics of interest by calling the compile method. For instance, you can compile this model using the Adam optimizer with a learning rate of 0.001, minimizing the binary cross entropy loss and displaying the accuracy. model.compile(Adam(0.001), loss='binary_crossentropy', metrics='accuracy') To display all metrics from training a model, just use this: history=model.fit(X_train,Y_train,epochs=50) print(history.history.keys()) 291 APPENDIX A TRaininG DNN With KeRaS A.1.6 Callbacks Keras can register a set of callbacks when training neural networks. The default callback tracks the training metrics for each epoch, including the loss and the accuracy for training and validation data. An object named history is returned from a call to the fit() function. Metrics are stored in the form of a dictionary in the history member of the object returned. The following is an example using a checkpoint to save the weights (in the file weights.hdf5) of the best model: from keras.callbacks import ModelCheckpoint checkpointbest = ModelCheckpoint(filepath='weights.hdf5', verbose=1, save_best_only=True) model.fit(x_train, y_train, epochs=20, validation_data= (x_test, y_test), callbacks=[checkpointbest]) A.1.7 Compile and Fit After the model is defined, it can be compiled; only at this point is the computational graph effectively generated. Compiling uses the numerical libraries from the Keras backend such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and makes predictions for running on hardware, such as a CPU or GPU and single or multiple. You can run models on a CPU, but a GPU is advisable if you are dealing with large image data sets because it will speed up the training by an order of magnitude. Compiling requires additional properties for training the network for finding the best set of weights connecting the neurons. You must specify the loss function to use to evaluate the network, the optimizer used to search through different weights for the network, and any optional metrics you would like to collect and report during training. 292 APPENDIX A TRaininG DNN With KeRaS For classification, you typically use logarithmic loss, which for a binary classification problem is defined in Keras as binary_crossentropy. For optimization, the gradient descent algorithm adam is commonly used. model.compile(loss='binary_crossentropy', optimizer='adam', metrics ['accuracy']) Other common optimizers include Adadelta, SGD, and Adagrad. To train, or fit, the model on data, you call the fit() function on the model. The training process will run for a fixed number of iterations through the data set called epochs, which is specified through the epochs argument. You can also set the number of instances that are evaluated before a weight update in the network is performed, called the batch size, using the batch_size argument. A.2 The Deep and Wide Model Wide and deep models can be jointly trained using linear models and deep neural networks. The wide component consists of a generalized linear model, and the cross-product interaction is modeled as a neural network with embedding layers (see Figure A-1). Figure A-1. Wide and deep neural network model The following code, in Python 2.7, is the Keras implementation of the code originally presented in TensorFlow. To run it, you need to download the adult

Load more