Deep Learning with Keras : : CHEAT SHEET Keras Tensorflow Intro INSTALLATION Define Compile Fit Evaluate Predict the Keras R Package Uses the Python Keras Library

Total Page:16

File Type:pdf, Size:1020Kb

Deep Learning with Keras : : CHEAT SHEET Keras Tensorflow Intro INSTALLATION Define Compile Fit Evaluate Predict the Keras R Package Uses the Python Keras Library Deep Learning with Keras : : CHEAT SHEET Keras TensorFlow Intro INSTALLATION Define Compile Fit Evaluate Predict The keras R package uses the Python keras library. Keras is a high-level neural networks API You can install all the prerequisites directly from R. developed with a focus on enabling fast • Model • Batch size https://keras.rstudio.com/reference/install_keras.html experimentation. It supports multiple back- • Sequential • Optimiser • Epochs • Evaluate • classes model • Loss • Validation • Plot • probability ends, including TensorFlow, CNTK and Theano. library(keras) See ?install_keras • • Metrics split Multi-GPU install_keras() for GPU instructions TensorFlow is a lower level mathematical model library for building deep neural network https://keras.rstudio.com This installs the required libraries in an Anaconda architectures. The keras R package makes it The “Hello, World!” environment or virtual environment 'r-tensorflow'. easy to use Keras and TensorFlow in R. https://www.manning.com/books/deep-learning-with-r of deep learning TRAINING AN IMAGE RECOGNIZER ON MNIST DATA Working with keras models # input layer: use MNIST images DEFINE A MODEL PREDICT CORE LAYERS mnist <- dataset_mnist() x_train <- mnist$train$x; y_train <- mnist$train$y keras_model() Keras Model predict() Generate predictions from a Keras model layer_input() Input layer x_test <- mnist$test$x; y_test <- mnist$test$y keras_model_sequential() Keras Model composed of a linear stack of layers predict_proba() and predict_classes() Generates probability or class probability predictions layer_dense() Add a densely- # reshape and rescale connected NN layer to an output multi_gpu_model() Replicates a model on different for the input samples x_train <- array_reshape(x_train, c(nrow(x_train), 784)) GPUs x_test <- array_reshape(x_test, c(nrow(x_test), 784)) predict_on_batch() Returns predictions for a single layer_activation() Apply an x_train <- x_train / 255; x_test <- x_test / 255 batch of samples activation function to an output COMPILE A MODEL y_train <- to_categorical(y_train, 10) predict_generator() Generates predictions for the layer_dropout() Applies Dropout y_test <- to_categorical(y_test, 10) compile(object, optimizer, loss, metrics = NULL) input samples from a data generator to the input Configure a Keras model for training layer_reshape() Reshapes an # defining the model and layers output to a certain shape model <- keras_model_sequential() FIT A MODEL OTHER MODEL OPERATIONS model %>% layer_dense(units = 256, activation = 'relu', fit(object, x = NULL, y = NULL, batch_size = NULL, summary() Print a summary of a Keras model layer_permute() Permute the input_shape = c(784)) %>% epochs = 10, verbose = 1, callbacks = NULL, …) dimensions of an input according Train a Keras model for a fixed number of epochs to a given pattern layer_dropout(rate = 0.4) %>% (iterations) export_savedmodel() Export a saved model layer_dense(units = 128, activation = 'relu') %>% n layer_repeat_vector() Repeats layer_dense(units = 10, activation = 'softmax’) fit_generator() Fits the model on data yielded batch- get_layer() Retrieves a layer based on either its the input n times name (unique) or index by-batch by a generator # compile (define loss and optimizer) pop_layer() Remove the last layer in a model x f(x) layer_lambda(object, f) Wraps model %>% compile( train_on_batch() test_on_batch() Single gradient arbitrary expression as a layer update or model evaluation over one batch of loss = 'categorical_crossentropy', samples save_model_hdf5(); load_model_hdf5() Save/ optimizer = optimizer_rmsprop(), L1 L2 layer_activity_regularization() Load models using HDF5 files Layer that applies an update to metrics = c('accuracy’) the cost function based input ) EVALUATE A MODEL serialize_model(); unserialize_model() activity Serialize a model to an R object # train (fit) layer_masking() Masks a model %>% fit( evaluate(object, x = NULL, y = NULL, batch_size = sequence by using a mask value to clone_model() Clone a model instance x_train, y_train, NULL) Evaluate a Keras model skip timesteps epochs = 30, batch_size = 128, freeze_weights(); unfreeze_weights() evaluate_generator() Evaluates the model on a data validation_split = 0.2 Freeze and unfreeze weights layer_flatten() Flattens an input generator ) model %>% evaluate(x_test, y_test) model %>% predict_classes(x_test) RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at keras.rstudio.com • keras 2.1.2 • Updated: 2017-12 More layers Preprocessing CONVOLUTIONAL LAYERS ACTIVATION LAYERS SEQUENCE PREPROCESSING Keras TensorFlow layer_conv_1d() 1D, e.g. layer_activation(object, activation) pad_sequences() temporal convolution Apply an activation function to an output Pads each sequence to the same length (length of Pre-trained models the longest sequence) layer_activation_leaky_relu() Keras applications are deep learning models layer_conv_2d_transpose() Leaky version of a rectified linear unit skipgrams() that are made available alongside pre-trained Transposed 2D (deconvolution) Generates skipgram word pairs weights. These models can be used for α layer_activation_parametric_relu() prediction, feature extraction, and fine-tuning. layer_conv_2d() 2D, e.g. spatial Parametric rectified linear unit make_sampling_table() application_xception() convolution over images Generates word rank-based probabilistic sampling xception_preprocess_input() layer_activation_thresholded_relu() table Xception v1 model Thresholded rectified linear unit layer_conv_3d_transpose() Transposed 3D (deconvolution) layer_activation_elu() TEXT PREPROCESSING application_inception_v3() inception_v3_preprocess_input() layer_conv_3d() 3D, e.g. spatial Exponential linear unit text_tokenizer() Text tokenization utility convolution over volumes Inception v3 model, with weights pre-trained on ImageNet fit_text_tokenizer() Update tokenizer internal layer_conv_lstm_2d() vocabulary Convolutional LSTM DROPOUT LAYERS application_inception_resnet_v2() save_text_tokenizer(); load_text_tokenizer() inception_resnet_v2_preprocess_input() layer_separable_conv_2d() layer_dropout() Save a text tokenizer to an external file Inception-ResNet v2 model, with weights Depthwise separable 2D Applies dropout to the input trained on ImageNet texts_to_sequences(); layer_upsampling_1d() layer_spatial_dropout_1d() layer_upsampling_2d() texts_to_sequences_generator() application_vgg16(); application_vgg19() layer_spatial_dropout_2d() Transforms each text in texts to sequence of integers VGG16 and VGG19 models layer_upsampling_3d() layer_spatial_dropout_3d() Upsampling layer Spatial 1D to 3D version of dropout texts_to_matrix(); sequences_to_matrix() application_resnet50() ResNet50 model layer_zero_padding_1d() Convert a list of sequences into a matrix layer_zero_padding_2d() RECURRENT LAYERS application_mobilenet() layer_zero_padding_3d() text_one_hot() One-hot encode text to word indices mobilenet_preprocess_input() Zero-padding layer layer_simple_rnn() mobilenet_decode_predictions() text_hashing_trick() mobilenet_load_model_hdf5() layer_cropping_1d() Fully-connected RNN where the output Converts a text to a sequence of indexes in a fixed- layer_cropping_2d() is to be fed back to input size hashing space MobileNet model architecture layer_cropping_3d() Cropping layer layer_gru() text_to_word_sequence() Gated recurrent unit - Cho et al Convert text to a sequence of words (or tokens) POOLING LAYERS ImageNet is a large database of images with layer_cudnn_gru() labels, extensively used for deep learning layer_max_pooling_1d() Fast GRU implementation backed IMAGE PREPROCESSING layer_max_pooling_2d() by CuDNN imagenet_preprocess_input() layer_max_pooling_3d() image_load() Loads an image into PIL format. imagenet_decode_predictions() Maximum pooling for 1D to 3D layer_lstm() Preprocesses a tensor encoding a batch of Long-Short Term Memory unit - flow_images_from_data() images for ImageNet, and decodes predictions layer_average_pooling_1d() Hochreiter 1997 flow_images_from_directory() layer_average_pooling_2d() Generates batches of augmented/normalized data layer_average_pooling_3d() layer_cudnn_lstm() from images and labels, or a directory Callbacks Average pooling for 1D to 3D Fast LSTM implementation backed by CuDNN A callback is a set of functions to be applied at layer_global_max_pooling_1d() image_data_generator() Generate minibatches of image data with real-time data augmentation. given stages of the training procedure. You can layer_global_max_pooling_2d() LOCALLY CONNECTED LAYERS use callbacks to get a view on internal states layer_global_max_pooling_3d() and statistics of the model during training. Global maximum pooling fit_image_data_generator() Fit image data layer_locally_connected_1d() generator internal statistics to some sample data callback_early_stopping() Stop training when layer_global_average_pooling_1d() layer_locally_connected_2d() a monitored quantity has stopped improving layer_global_average_pooling_2d() Similar to convolution, but weights are not generator_next() Retrieve the next item layer_global_average_pooling_3d() shared, i.e. different filters for each patch callback_learning_rate_scheduler() Learning Global average pooling rate scheduler image_to_array(); image_array_resize() callback_tensorboard() TensorBoard basic image_array_save() 3D array representation visualizations RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at keras.rstudio.com • keras 2.1.2 • Updated: 2017-12.
Recommended publications
  • Neural Networks (AI) (WBAI028-05) Lecture Notes
    Herbert Jaeger Neural Networks (AI) (WBAI028-05) Lecture Notes V 1.5, May 16, 2021 (revision of Section 8) BSc program in Artificial Intelligence Rijksuniversiteit Groningen, Bernoulli Institute Contents 1 A very fast rehearsal of machine learning basics 7 1.1 Training data. .............................. 8 1.2 Training objectives. ........................... 9 1.3 The overfitting problem. ........................ 11 1.4 How to tune model flexibility ..................... 17 1.5 How to estimate the risk of a model .................. 21 2 Feedforward networks in machine learning 24 2.1 The Perceptron ............................. 24 2.2 Multi-layer perceptrons ......................... 29 2.3 A glimpse at deep learning ....................... 52 3 A short visit in the wonderland of dynamical systems 56 3.1 What is a “dynamical system”? .................... 58 3.2 The zoo of standard finite-state discrete-time dynamical systems .. 64 3.3 Attractors, Bifurcation, Chaos ..................... 78 3.4 So far, so good ... ............................ 96 4 Recurrent neural networks in deep learning 98 4.1 Supervised training of RNNs in temporal tasks ............ 99 4.2 Backpropagation through time ..................... 107 4.3 LSTM networks ............................. 112 5 Hopfield networks 118 5.1 An energy-based associative memory ................. 121 5.2 HN: formal model ............................ 124 5.3 Geometry of the HN state space .................... 126 5.4 Training a HN .............................. 127 5.5 Limitations ..............................
    [Show full text]
  • Intro to Tensorflow 2.0 MBL, August 2019
    Intro to TensorFlow 2.0 MBL, August 2019 Josh Gordon (@random_forests) 1 Agenda 1 of 2 Exercises ● Fashion MNIST with dense layers ● CIFAR-10 with convolutional layers Concepts (as many as we can intro in this short time) ● Gradient descent, dense layers, loss, softmax, convolution Games ● QuickDraw Agenda 2 of 2 Walkthroughs and new tutorials ● Deep Dream and Style Transfer ● Time series forecasting Games ● Sketch RNN Learning more ● Book recommendations Deep Learning is representation learning Image link Image link Latest tutorials and guides tensorflow.org/beta News and updates medium.com/tensorflow twitter.com/tensorflow Demo PoseNet and BodyPix bit.ly/pose-net bit.ly/body-pix TensorFlow for JavaScript, Swift, Android, and iOS tensorflow.org/js tensorflow.org/swift tensorflow.org/lite Minimal MNIST in TF 2.0 A linear model, neural network, and deep neural network - then a short exercise. bit.ly/mnist-seq ... ... ... Softmax model = Sequential() model.add(Dense(256, activation='relu',input_shape=(784,))) model.add(Dense(128, activation='relu')) model.add(Dense(10, activation='softmax')) Linear model Neural network Deep neural network ... ... Softmax activation After training, select all the weights connected to this output. model.layers[0].get_weights() # Your code here # Select the weights for a single output # ... img = weights.reshape(28,28) plt.imshow(img, cmap = plt.get_cmap('seismic')) ... ... Softmax activation After training, select all the weights connected to this output. Exercise 1 (option #1) Exercise: bit.ly/mnist-seq Reference: tensorflow.org/beta/tutorials/keras/basic_classification TODO: Add a validation set. Add code to plot loss vs epochs (next slide). Exercise 1 (option #2) bit.ly/ijcav_adv Answers: next slide.
    [Show full text]
  • Predrnn: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal Lstms
    PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs Yunbo Wang Mingsheng Long∗ School of Software School of Software Tsinghua University Tsinghua University [email protected] [email protected] Jianmin Wang Zhifeng Gao Philip S. Yu School of Software School of Software School of Software Tsinghua University Tsinghua University Tsinghua University [email protected] [email protected] [email protected] Abstract The predictive learning of spatiotemporal sequences aims to generate future images by learning from the historical frames, where spatial appearances and temporal vari- ations are two crucial structures. This paper models these structures by presenting a predictive recurrent neural network (PredRNN). This architecture is enlightened by the idea that spatiotemporal predictive learning should memorize both spatial ap- pearances and temporal variations in a unified memory pool. Concretely, memory states are no longer constrained inside each LSTM unit. Instead, they are allowed to zigzag in two directions: across stacked RNN layers vertically and through all RNN states horizontally. The core of this network is a new Spatiotemporal LSTM (ST-LSTM) unit that extracts and memorizes spatial and temporal representations simultaneously. PredRNN achieves the state-of-the-art prediction performance on three video prediction datasets and is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures. 1 Introduction
    [Show full text]
  • Ways to Use Machine Learning Approaches for Software Development
    Ways to use Machine Learning approaches for software development Nicklas Jonsson Nicklas Jonsson VT 2018 Examensarbete, 30 hp Supervisor: Eddie wadbro Extern Supervisor: C4 Contexture Examiner: Henrik Bjorklund¨ Master of Science Programme in Computing Science and Engineering, 300 hp Abstract With the rise of machine learning and in particular deep learning enter- ing all different types of fields, including software development. It could be a bit hard to know where to begin to search for the tools when some- one wants to use machine learning for a one’s problems. This thesis has looked at some available technologies of today for applying machine learning to one’s applications. This thesis has looked at some of the available cloud services, frame- works, and libraries for machine learning and it presents three different implementation structures that can be used with these technologies for the problem of image classification. Acknowledgements I want to thank C4 Contexture for giving me the thesis idea, support, and supplying me with a working station. I also want to thank Lantmannen¨ for supplying me with the image data that was used for this thesis. Finally, I want to thank Eddie Wadbro for his guidance during this thesis and of course a big thanks to my family and friends for their support during this period of my life. 1(45) Content 1 Introduction 3 1.1 The Client 3 1.1.1 C4 Contexture PIM software 4 1.2 The data 4 1.3 Goal 5 1.4 Limitation 5 2 Background 7 2.1 Artificial intelligence, machine learning and deep learning.
    [Show full text]
  • Mxnet-R-Reference-Manual.Pdf
    Package ‘mxnet’ June 24, 2020 Type Package Title MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Sys- tems Version 2.0.0 Date 2017-06-27 Author Tianqi Chen, Qiang Kou, Tong He, Anirudh Acharya <https://github.com/anirudhacharya> Maintainer Qiang Kou <[email protected]> Repository apache/incubator-mxnet Description MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavours of deep learning programs together to maximize the efficiency and your productivity. License Apache License (== 2.0) URL https://github.com/apache/incubator-mxnet/tree/master/R-package BugReports https://github.com/apache/incubator-mxnet/issues Imports methods, Rcpp (>= 0.12.1), DiagrammeR (>= 0.9.0), visNetwork (>= 1.0.3), data.table, jsonlite, magrittr, stringr Suggests testthat, mlbench, knitr, rmarkdown, imager, covr Depends R (>= 3.4.4) LinkingTo Rcpp VignetteBuilder knitr 1 2 R topics documented: RoxygenNote 7.1.0 Encoding UTF-8 R topics documented: arguments . 15 as.array.MXNDArray . 15 as.matrix.MXNDArray . 16 children . 16 ctx.............................................. 16 dim.MXNDArray . 17 graph.viz . 17 im2rec . 18 internals . 19 is.mx.context . 19 is.mx.dataiter . 20 is.mx.ndarray . 20 is.mx.symbol . 21 is.serialized . 21 length.MXNDArray . 21 mx.apply . 22 mx.callback.early.stop . 22 mx.callback.log.speedometer . 23 mx.callback.log.train.metric . 23 mx.callback.save.checkpoint . 24 mx.cpu . 24 mx.ctx.default . 24 mx.exec.backward . 25 mx.exec.forward . 25 mx.exec.update.arg.arrays . 26 mx.exec.update.aux.arrays . 26 mx.exec.update.grad.arrays .
    [Show full text]
  • Jssjournal of Statistical Software MMMMMM YYYY, Volume VV, Issue II
    JSSJournal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ OSTSC: Over Sampling for Time Series Classification in R Matthew Dixon Diego Klabjan Lan Wei Stuart School of Business, Department of Industrial Department of Computer Science, Illinois Institute of Technology Engineering and Illinois Institute of Technology Management Sciences, Northwestern University Abstract The OSTSC package is a powerful oversampling approach for classifying univariant, but multinomial time series data in R. This article provides a brief overview of the over- sampling methodology implemented by the package. A tutorial of the OSTSC package is provided. We begin by providing three test cases for the user to quickly validate the functionality in the package. To demonstrate the performance impact of OSTSC,we then provide two medium size imbalanced time series datasets. Each example applies a TensorFlow implementation of a Long Short-Term Memory (LSTM) classifier - a type of a Recurrent Neural Network (RNN) classifier - to imbalanced time series. The classifier performance is compared with and without oversampling. Finally, larger versions of these two datasets are evaluated to demonstrate the scalability of the package. The examples demonstrate that the OSTSC package improves the performance of RNN classifiers applied to highly imbalanced time series data. In particular, OSTSC is observed to increase the AUC of LSTM from 0.543 to 0.784 on a high frequency trading dataset consisting of 30,000 time series observations. arXiv:1711.09545v1 [stat.CO] 27 Nov 2017 Keywords: Time Series Oversampling, Classification, Outlier Detection, R. 1. Introduction A significant number of learning problems involve the accurate classification of rare events or outliers from time series data.
    [Show full text]
  • Keras2c: a Library for Converting Keras Neural Networks to Real-Time Compatible C
    Keras2c: A library for converting Keras neural networks to real-time compatible C Rory Conlina,∗, Keith Ericksonb, Joeseph Abbatec, Egemen Kolemena,b,∗ aDepartment of Mechanical and Aerospace Engineering, Princeton University, Princeton NJ 08544, USA bPrinceton Plasma Physics Laboratory, Princeton NJ 08544, USA cDepartment of Astrophysical Sciences at Princeton University, Princeton NJ 08544, USA Abstract With the growth of machine learning models and neural networks in mea- surement and control systems comes the need to deploy these models in a way that is compatible with existing systems. Existing options for deploying neural networks either introduce very high latency, require expensive and time con- suming work to integrate into existing code bases, or only support a very lim- ited subset of model types. We have therefore developed a new method called Keras2c, which is a simple library for converting Keras/TensorFlow neural net- work models into real-time compatible C code. It supports a wide range of Keras layers and model types including multidimensional convolutions, recurrent lay- ers, multi-input/output models, and shared layers. Keras2c re-implements the core components of Keras/TensorFlow required for predictive forward passes through neural networks in pure C, relying only on standard library functions considered safe for real-time use. The core functionality consists of ∼ 1500 lines of code, making it lightweight and easy to integrate into existing codebases. Keras2c has been successfully tested in experiments and is currently in use on the plasma control system at the DIII-D National Fusion Facility at General Atomics in San Diego. 1. Motivation TensorFlow[1] is one of the most popular libraries for developing and training neural networks.
    [Show full text]
  • Verification of RNN-Based Neural Agent-Environment Systems
    The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Verification of RNN-Based Neural Agent-Environment Systems Michael E. Akintunde, Andreea Kevorchian, Alessio Lomuscio, Edoardo Pirovano Department of Computing, Imperial College London, UK Abstract and realised via a previously trained recurrent neural net- work (Hausknecht and Stone 2015). As we discuss below, We introduce agent-environment systems where the agent we are not aware of other methods in the literature address- is stateful and executing a ReLU recurrent neural network. ing the formal verification of systems based on recurrent We define and study their verification problem by providing networks. The present contribution focuses on reachability equivalences of recurrent and feed-forward neural networks on bounded execution traces. We give a sound and complete analysis, that is we intend to give formal guarantees as to procedure for their verification against properties specified whether a state, or a set of states are reached in the system. in a simplified version of LTL on bounded executions. We Typically this is an unwanted state, or a bug, that we intend present an implementation and discuss the experimental re- to ensure is not reached in any possible execution. Similarly, sults obtained. to provide safety guarantees, this analysis can be used to show the system does not enter an unsafe region of the state space. 1 Introduction The rest of the paper is organised as follows. After the A key obstacle in the deployment of autonomous systems is discussion of related work below, in Section 2, we introduce the inherent difficulty of their verification. This is because the basic concepts related to neural networks that are used the behaviour of autonomous systems is hard to predict; in throughout the paper.
    [Show full text]
  • Anurag Soni's Blog
    ANURAG SONI 447 Sharon Rd Pittsburgh, PA, USA |[email protected] | 765-775-0116 | GitHub | Blog | LinkedIn PROFILE • MS Business Analytics student seeking position in data science and decision-making business units • 5 years of experience as Analyst/Product Mgr. for a technology consulting firm with Investment Banking clients • Skills in figuring out impactful stories with data EDUCATION Purdue University, Krannert School of Management West Lafayette, IN Master of Science in Business Analytics and Information Management May 2018 Indian Institute of Technology Guwahati Guwahati, INDIA Bachelor of Technology, Chemical Engineering May 2011 HIGHLIGHTS • Tools: PL/SQL, Python, R, Excel(VBA), SAS, MATLAB, Tableau, Gurobi • Environment: TensorFlow, Keras, Hadoop, Hive, Docker, Google Cloud, UNIX • Process: Data Mining, SDLC, Process Simulation, Optimization, Data Modeling • Skills: Client-Facing, Strong Communication, Change Management, Team Management, Mentoring EXPERIENCE Purdue BIZ Analytics LAB West Lafayette, IN Associate Analyst(Co-Op) July 2017 – Mar 2018 • Demand Forecasting of supermarket sales: Built a production ready machine-learning solution using - Python deep learning libraries with results expecting to save significant inventory costs • Delivery Rebate Forecast for retailers: Built a forecasting model that supports reporting of weekly expected rebate numbers in an annual tiered delivery contract. • Decision Support System for Music Producer: Implemented dashboard application that uses Spotify datasets to partially cluster songs by their popularity and revenue capability • Production implementation of ML solution: Created an automated machine learning pipeline of an object- detection solution built using TensorFlow on Google Cloud using Docker and Kubernetes technology • Hate Speech Recognition using NLP: Implemented a text mining model that identifies hate speech characteristics in the text.
    [Show full text]
  • Deep Learning Architectures for Sequence Processing
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021. All rights reserved. Draft of September 21, 2021. CHAPTER Deep Learning Architectures 9 for Sequence Processing Time will explain. Jane Austen, Persuasion Language is an inherently temporal phenomenon. Spoken language is a sequence of acoustic events over time, and we comprehend and produce both spoken and written language as a continuous input stream. The temporal nature of language is reflected in the metaphors we use; we talk of the flow of conversations, news feeds, and twitter streams, all of which emphasize that language is a sequence that unfolds in time. This temporal nature is reflected in some of the algorithms we use to process lan- guage. For example, the Viterbi algorithm applied to HMM part-of-speech tagging, proceeds through the input a word at a time, carrying forward information gleaned along the way. Yet other machine learning approaches, like those we’ve studied for sentiment analysis or other text classification tasks don’t have this temporal nature – they assume simultaneous access to all aspects of their input. The feedforward networks of Chapter 7 also assumed simultaneous access, al- though they also had a simple model for time. Recall that we applied feedforward networks to language modeling by having them look only at a fixed-size window of words, and then sliding this window over the input, making independent predictions along the way. Fig. 9.1, reproduced from Chapter 7, shows a neural language model with window size 3 predicting what word follows the input for all the. Subsequent words are predicted by sliding the window forward a word at a time.
    [Show full text]
  • Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction
    TensorFlow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction TensorFlow Google Brain, 2015 (rewritten DistBelief) Theano University of Montréal, 2009 Keras François Chollet, 2015 (now at Google) Torch Facebook AI Research, Twitter, Google DeepMind Caffe Berkeley Vision and Learning Center (BVLC), 2013 Outline 1. Introduction of each framework a. TensorFlow b. Theano c. Keras d. Torch e. Caffe 2. Further comparison a. Code + models b. Community and documentation c. Performance d. Model deployment e. Extra features 3. Which framework to choose when ..? Introduction of each framework TensorFlow architecture 1) Low-level core (C++/CUDA) 2) Simple Python API to define the computational graph 3) High-level API (TF-Learn, TF-Slim, soon Keras…) TensorFlow computational graph - auto-differentiation! - easy multi-GPU/multi-node - native C++ multithreading - device-efficient implementation for most ops - whole pipeline in the graph: data loading, preprocessing, prefetching... TensorBoard TensorFlow development + bleeding edge (GitHub yay!) + division in core and contrib => very quick merging of new hotness + a lot of new related API: CRF, BayesFlow, SparseTensor, audio IO, CTC, seq2seq + so it can easily handle images, videos, audio, text... + if you really need a new native op, you can load a dynamic lib - sometimes contrib stuff disappears or moves - recently introduced bells and whistles are barely documented Presentation of Theano: - Maintained by Montréal University group. - Pioneered the use of a computational graph. - General machine learning tool -> Use of Lasagne and Keras. - Very popular in the research community, but not elsewhere. Falling behind. What is it like to start using Theano? - Read tutorials until you no longer can, then keep going.
    [Show full text]
  • Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges
    Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges TRIET H. M. LE, The University of Adelaide HAO CHEN, The University of Adelaide MUHAMMAD ALI BABAR, The University of Adelaide Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well. CCS Concepts: • General and reference → Surveys and overviews; • Computing methodologies → Neural networks; Natural language processing; • Software and its engineering → Software notations and tools; Source code generation; Additional Key Words and Phrases: Deep learning, Big Code, Source code modeling, Source code generation 1 INTRODUCTION Deep Learning (DL) has recently emerged as an important branch of Machine Learning (ML) because of its incredible performance in Computer Vision and Natural Language Processing (NLP) [75]. In the field of NLP, it has been shown that DL models can greatly improve the performance of many classic NLP tasks such as semantic role labeling [96], named entity recognition [209], machine translation [256], and question answering [174].
    [Show full text]