Neural Networks API, Written in Python and Capable of Running on Top of a Deferred Execution Backend

Keras’s Phylanx Backend Bita Hasheminezhad STE||AR GROUP Outline – Available Deep Learning Platforms – What’s Special about Keras – Keras Backends – Inference Example 1; Multi-Class classification – Inference Example 2; Sentiment Analysis – Keras In Future – Conclusion 2 Deep Learning Platforms – Spark Apache – Caffe Berkeley AI Research – DistBelief Google – Caffe2 Facebook – TensorFlow Google – PyTorch Facebook – CNTK Microsoft – SINGA National university of Singapore – Project Adam Microsoft – Chainer Preferred Networks – MXNet Apache – CoreML Apple – Theano Universite de Montreal 3 Deep Learning Platforms – Spark Apache – Caffe -> Caffe2 -> PyTorch Facebook – DistBelief -> TensorFlow Google – SINGA National university of Singapore – CNTK Microsoft – Chainer Preferred Networks – Project Adam Microsoft – MXNet Apache – CoreML Apple – Theano Universite de Montreal Support Keras 4 What is Keras? – Keras is a high-level neural networks API, written in Python and capable of running on top of a deferred execution backend. User friendly Modular Easily extensible Fig 1. Number of publications during the last decade having the name of the DL platform in their full text1 [1] https://app.dimensions.ai/discover/publication 5 Deep Learning Platforms – Caffe -> Caffe2 -> PyTorch Facebook – TensorFlow Google – CNTK Microsoft Imperative or Eager Style – Theano Universite de Montreal – MXNet Apache – CoreML Apple Deferred Style – Deferred Execution: it has two distinct phases: the first phase defines the program as a symbolic graph; and the second phase executes an optimized version of the program on the set of available devices.2 [2] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16) (pp. 265-283).c 6 Keras different backends Table 1. Investigating parallelism in the deep learning platforms supported by Keras Platform Data Parallelism Model Parallelism TensorFlow Synchronous or asynchronous Supported using greedy through parameter servers heuristics CNTK Bounded asynchronous through a parameter server model Theano Not on multiple nodes MXNet Synchronous or asynchronous Not on multiple nodes through parameter servers – “When gradient nodes are automatically added to the graph, the user has less control, and the heuristics may break down. ”2 7 The solution to the problem Problem: On a single node, training ResNet50 on the ImageNet data set on an NVIDIA M40 GPU takes 14 days!3 Solution: A High-Performance Keras Backend which Is deferred style; can optimize the expression graph Is distributed and can run on multiple nodes Uses asynchronous computations; avoids straggler problem Let’s use HPX! [3] Zhang, Z., Yin, L., Peng, Y., & Li, D. (2018, December). A Quick Survey on Large Scale Distributed Deep Learning Systems. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) (pp. 1052-1056). IEEE. 8 HPX as a backend for Keras Phylanx Keras (Python) HPX (C++) (Python Frontend, C++ Backend) – Using hints from the user and the optimization step the expression graph will be passed to the HPX runtime which schedules work and infers the data layout on each compute locale arguments.4 [4] http://phylanx.stellar-group.org/ 9 How to implement a Keras Backend Keras related Basic math Up to 4D Not yet backend Late Binding epsilon set_epsilon map_fn shape int_shape ndim reshape count_params foldl flodr floatx cast_to_floatx eye ones ones_like zeros zeros_like constant Inference Training image_data_format set_floatx dtype cast dot batch_dot transpose gather in_top_k set_image_data_format max min sum prod cumsum cumprod argmax argmin Activations and Losses normalize_data_format var std mean square sqrt abs exp log logsumexp any all equal not_equal greater greater_equal less less_equal relu elu tanh softmax softplus softsign main round sign pow clip maximum minimum permute_dimensions concatenate stack repeat_elements repeat tile sin cos categorical_crossentropy binary_crossentropy name_scope arange flatten expand_dims squeeze one_hot reverse sparse_categorical_crossentropy variable eval slice switch bias_add dropout l2_normalize sigmoid hard_sigmoid get_uid reset_uids random_binomial random_uniform is_sparse to_dense random_normal truncated_normal update update_add update_sub learning_phase set_learning_phase Convolutional Batch related moving_average_update identity batch_get_value batch_set_value in_train_phase in_test_phase pool2d pool3d batch_flatten batch_normalization is_tensor is_keras_tensor conv1d conv2d conv3d normalize_batch_in_training placeholder is_placeholder separable_conv1d separable_conv2d gradients stop_gradient temporal_padding spatial_2d_padding spatial_3d_padding random_uniform_variable conv2d_transpose conv3d_transpose depthwise_conv2d Recurrent random_normal_variable resize_images resize_volumes rnn ctc_decode get_value set_value local_conv1d local_conv2d ctc_label_dense_to_sparse print_tensor ctc_batch_cost function 10 Inference Example 1; Multi-Class Classification fromcorrectskeras= Kimport.equal(backendlabels_predas K, y_test) Using Phylanx backend. fromcorrectskeras= K.datasets.cast(corrects,import'intmnist64') y_train shape: (60000,) fromprint(keras"Correct.utilslabelsimport:", Kto_categorical.get_value(corrects)) y_train shape, after one_hot encoding: (60000, 10) import numpy as np class_predict shape: (10000, 10) importnumber_of_correctspandas as pd = K.get_value(K.sum(corrects)) A sample of class_predict: [3.27987540e-37 1.93442800e-25 5.78854500e-25 1.94946260e-21 print("Number of corrects predictions: %d "%number_of_corrects) 3.15305600e-31 1.03375155e-32 0.00000000e+00 1.00000000e+00 4.98417950e-32 3.93246830e-21] (_,correctsy_train=),K(.x_testexpand_dims, y_test)(corrects,= mnist.load_dataaxis=0) () Predicted labels: [7 2 1 ... 4 5 6] num_classesnum_images = lenK.int_shape(np.unique(corrects)[(y_train))1] What we have on y_test [7 2 1 ... 4 5 6] #printconvert("Accuracyclass vectors: %.2f%%to"binary% ((number_of_correctsclass matrices *100)/num_images)) Correct labels: [1 1 1 ... 1 1 1] print# Misclassified("y_train shape:", y_train.shape) Number of corrects predictions: 9837 y_trainincorrects= to_categorical= K.not_equal(y_train(corrects,, num_classes1) ) Accuracy: 98.37% printincorrects("y_train= K.shape,eval(incorrectsafter one_hot) encoding:", y_train.shape) Label 4 is misclassified as 2 Label 2 is misclassified as 7 dflabels_error= pd.read_csv= (lambda('class_predx: x[0.]csv'* x[)1])([K.eval(labels_pred), incorrects]) Label 5 is misclassified as 3 class_predlabels_true==df(lambda.values x: x[0] * x[1])([K.eval(y_test), incorrects]) Label 3 is misclassified as 7 printlabels_true_slice("class_predict= Kshape.slice(:K".,squeezeclass_pred(K.variable.shape)(labels_true),0),[0],[500]) Label 6 is misclassified as 0 printlabels_error_slice("A sample of=class_predictK.slice(K.flatten:", class_pred(K.variable[0(labels_error]) )),[0],[500]) Label 9 is misclassified as 3 labels_predfor i,j in zip(K=.get_valueK.argmax((class_predlabels_true_slice, axis=1),)K.get_value(labels_error_slice)): Label 8 is misclassified as 2 printif i !=("Predicted0: labels:", K.get_value(labels_pred)) Label 2 is misclassified as 7 printprint("What("Label"we have,i,"isonmisclassifiedy_test", K.evalas",j()y_test)) Label 8 is misclassified as 4 11 Inference Example 2; Sentiment Analysis largest_indexy_scorefrom keras= K.importgather= y_test(backendlabels_pred.size - as1 K, desc_score_indices) Using Phylanx backend. y_truefrom keras= .gather.datasets(y_trueimport, desc_score_indicesimdb ) df = pd.read_csvK ('labels_pred.csv') classes: [0 1] labels_predy_trueimport=numpyK.cast= K(as.y_truesqueezenp , "int(df.64values") , 1) Predicted labels: [9.2614290e-03 9.9999920e-01 9.9997926e-01 printimport("Predicted"y_truepandas", Kas.labelsget_valuepd :", K.(get_valuey_true)) (labels_pred)) ... 6.2763690e-05 3.3009052e-03 6.0482204e-01] [email protected]("y_score(K.get_value", K.get_value(labels_pred(y_score)) )) Accuracy: 86.704 pltdiffdef.show=unique_eagernp.()diff(K.eval(x)(y_score: )) desc_score_indices [12420 1594 2351 ... 11280 13389 18853] correctsdistinct_value_indicesreturn=npK..lessunique(K.abs(x)(=labels_predwhere(K.not_equal- K.variable(diff,(y_test0)) )), .5) y_true [1 1 1 ... 0 0 0] printdistinct_value_indicesunique("Accuracy= Phylanx:",.lazyK.get_value(unique_eager= K.get_value(K.sum()distinct_value_indices(K.cast(corrects,"int32)[")))0]*100/ y_score [1. 1. 1. ... 0. 0. 0.] [email protected]_shapePhylanx("distinct_value_indices(corrects)[0]) ", distinct_value_indices) distinct_value_indices [ 484 593 981 ... 23598 23794 23973] threshold_idxsdef argsort_eager= K(x).eval: (K.concatenate True Positives: [ 483 591 975 ... 12493 12495 12500] y_true([Kreturn.variable= Knp.equal(distinct_value_indices.argsort(y_test(x) , 1.) ), False Positives: [ 2 3 7 ... 11302 11479 12500] K.variable(np.array([largest_index]))], 0)) Decreasing Threshold: [1.0000000e+00 9.9999994e-01 P N #argsortsort scores= Phylanxand corresponding.lazy(argsort_eagertruth) values # accumulate the true positives with decreasing threshold 9.9999990e-01 ... 5.9604645e-08 2.9802322e-08 indices@Phylanx= argsort(labels_pred) tps = K.get_value(K.gather(K.cumsum(y_true), threshold_idxs)) 0.0000000e+00] desc_score_indicesdef where_eager(x)=: K.eval(K.reverse(indices, 0)) P TP FP print("True Positives:", tps) printreturn("desc_score_indicesnp.where(x) ", desc_score_indices) fps =

Load more