Choosing a

There are a lot of them

JesseBrizzi{.com,@gmail.com,@curalate.com} Who am I/What do I do?

● Research Engineer ○ Focus in and ○ CS background ● Work on Image Intelligence Team @Curalate ○ E-Commerce SaaS ○ Platform to enable brands to find image based social media content to repurpose for e-commerce purposes. ○ Image Intelligence Team owns entire pipeline of researching new ML application to training, development, and then getting it into production. ○ Intelligent Product Tagging - technology that can analyze an image and use machine learning to identify specific products depicted within that image.

Choosing a Deep Learning Library Choosing a Deep Learning Library at’s L’s a Neural Net?

Choosing a Deep Learning Library at’s L’s a Neural Net?

● FCN - Fully Connected Network ○ Multilayer /fundamental neural net where each neuron is connect to all neurons in the previous of the network. ● CNN - Convolutional Neural Network ○ Neural net that uses convolutional layers, heavily used in Computer Vision applications. ● - ○ Neural net that feeds its output back into itself to process the next input, heavily used in Natural Language Processing applications. ● LSTM - Long Short-Term Memory Recurrent Neural Net ○ Fancy RNNs that contain additional control over what output is passed to the next input.

Choosing a Deep Learning Library Important Factors

● Academia vs Industry ○ Who is the target audience? ● Community support ○ Pretrained models? ○ Research paper repos? ○ How googleable are bugs and issues? ● Development speed/barriers for entry ○ Abstractions of low level concepts. ○ Documentation quality ○ Supported programming languages ○ The ability to Scale

Choosing a Deep Learning Library Important Factors

● Codebase Quality ○ Is the code actively maintained? ●Performance ○ Benchmarks (oldish) https://arxiv.org/pdf/1608.07249.pdf ○ Performance does not scale very well on CPUs. 16 core CPUs are only slightly better than 4 or 8 core CPUs. ○ GPUs perform much better than many-core CPUs. ○ Scalability across multiple GPUs ○ Performance is also affected by the design of configuration /implementation paradigm.

Choosing a Deep Learning Library Important Factors

● Train to Production pipeline ○ Support for a fast to prototype language (python, ) and deployment in your production language (java/scala, ++, JS, whatever). ○ Train locally if you have the hardware vs training on pre-prepared, simplified cloud services. ○ Ability to run on different platforms ranging from mobile phones to massive server farms ○ Transfer your work to other libraries

Choosing a Deep Learning Library Imperative vs Symbolic paradigms

● Dynamic Computation Graphing (Imperative Programming) ○ Are built at runtime which lets you use standard language statements. ○ At run time the system generation the graph structure. ○ Useful for when the graph structure needs to change at run time. ○ Makes debugging easy. ● Imperative programs tend to be more flexible ○ It’s easier to use native language features. ○ The graph can follow your programs logical control flow.

Choosing a Deep Learning Library Imperative vs Symbolic paradigms

● Symbolic Programs Tend to be More Efficient ○ Both in terms of memory and speed. ○ Can safely reuse the memory for in-place computation. ○ Can also operation folding optimizations. ● Static Computation Graphing (Symbolic Paradigm) ○ Define the computation graph once, execute graph many times. ○ Can optimized the graph at the start ○ Good for fixed size Net (feed-forward, CNN) ● Easier to manage in terms of loading and resources

Choosing a Deep Learning Library Libraries That People Should Know About

● IMO the first mainstream production ready lib. UC Berkeley ○ high performance and well tested C++ codebase. Watches: 2,241 Stars: 27,296 ● One of the first, and largest, model zoos. Forks: 16,454 ● Large community of open source research projects. Avg Issue Resolution: 3 Days Open issues: 13% ● Able to train a net from your data without writing any code. Symbolic Paradigm ● Good for feedforward networks, image processing, and for fine-tuning pretrained nets Research Citations (2014): 10,159 ● Main advantage was being first to market. Model zoo ● Can convert models to almost any other relevant lib.

Choosing a Deep Learning Library Caffe

● Has bad design choices that are inherited from its original use case: UC Berkeley conventional CNN applications. Watches: 2,241 Stars: 27,296 ● Not good for recurrent networks Forks: 16,454 ● Does not support Auto differentiation Avg Issue Resolution: 3 Days Open issues: 13% ● Very verbose in layer and network definitions Symbolic Paradigm ○ the graph is treated as a collection of layers, as opposed to nodes of single tensor operations Research Citations (2014): 10,159 Model zoo

Choosing a Deep Learning Library

● A library that sits on top of other DL libs and provides a single, easy to use, high level interface. ● Very modular, minimal, readable, object oriented code. Keras ● Great for beginners, with great documentation Watches: 1,982 | Stars: 38,796 ● Lacks in optimizations Forks: 14,799 ● Supported backends Avg Issue Resolution: 23 Days Open issues: 24% ○ Tensorflow, , CNTK, MXNet Symbolic Paradigm ● Can export your trained models into the backends format. ● Fork included in TensorFlow’s Python library. Model zoo ● Not as customizable

Choosing a Deep Learning Library Tensorflow

● The current most popular option. ○ Largest active community Watches: 8,606 Stars: 121,864 ○ More open source projects and models. Forks: 72,545 Avg Issue Resolution: 8 Days ● Google’s attempt to build a single deep learning framework for Open issues: 16% everything deep learning related. Symbolic/Dynamic Paradigm ○ Built with massive distributed computing in mind (powers G-apps). Research Citations (2016): 6233 ○ Has mobile capabilities in the form of TensorFlow Mobile and Model zoo TensorFlow Light. ● TensorBoard is amazing for debugging and training. CNN Example Code (Keras R) ● TensorFlow Serving for prod deployments (python) CNN Example Code (Keras Py)

● A lot of documentation (official and 3rd party) CNN Example Code

Choosing a Deep Learning Library Tensorflow

● Deep Google Cloud integration. Google ● Pretty low level (Keras and Sonnet help solve this) Watches: 8,606 Stars: 121,864 ● Most things outside of the core c/python library are “experimental” Forks: 72,545 Avg Issue Resolution: 8 Days ○ All of the outside of the Python API are not covered by Open issues: 16% their API stability promises. Symbolic/Dynamic Paradigm ● Biggest issue with library is performance. Research Citations (2016): 6233 ○ TensorFlow is just slower and more of a resource hog when Model zoo compared to the other libraries. CNN Example Code (Keras R) ○ Other libs can perform twice as fast on typical deep net tasks. ○ Avoid for performant RNNs or LSTMs networks. CNN Example Code (Keras Py) ○ Worst at scaling efficiency. CNN Example Code

Choosing a Deep Learning Library /PyTorch ● Torch was one of the original academic Deepmind, NYU, IDIAP focused libs. ● Many maintainers went to work at Watches: 665 | Stars: 8,218 Watches: 1,197 | Stars: 25,450 Forks: 2,340 Forks: 6,044 Facebook and created PyTorch. Avg Issue Resolution: 69 Days Avg Issue Resolution: 6 Days Open issues: 34% Open issues: 24% ● They use the same underlying C lib. ○ Provide similar performance. Symbolic Paradigm Symbolic/Dynamic Paradigm ● They differ in Research Citations: 1,246 Research Citations: 879

○ Interface (Lua vs Python) Model zoo Model zoo

○ Auto diff capabilities CNN Example Code ○ Paradigms

Choosing a Deep Learning Library PyTorch ● PyTorch was made with the goal of fixing or modernizing Torch. Facebook ● Hybrid fronted for switching between paradigms. ● PyTorch also has its own visualization dashboard called Visdom. Watches: 1,197 | Stars: 25,450 Forks: 6,044 ● Probably should avoid if want to deploy into production. Avg Issue Resolution: 6 Days Open issues: 24% ○ Facebook maintains a separate lib targeted at developers, Caffe2. Symbolic/Dynamic Paradigm ○ Making changes to make PyTorch production ready. Research Citations: 879

○ Caffe2 recently merged into PyTorch Model zoo

● Researchers tend to prefer PyTorch over Tensorflow CNN Example Code ○ Makes prototyping easy

Choosing a Deep Learning Library MXNet

● Newer and growing option. Apache, Amazon

● Largest officially supported API selection. Watches: 1,180 | Stars: 16,450 ○ High compatibility and consistency. Forks: 5,889 Avg Issue Resolution: 40 Days ● Direct competitor to TensorFlow across all applications. Open issues: 13%

○ It can run on everything from a web browser, a mobile Symbolic/Dynamic Paradigm phone, to a massive distributed server farm. Research Citations: 712 ○ Amazon has found that you can get up to an 85% scaling Model zoo efficiency with MXNet. CNN Example Python Code ● Has its own serving framework and deep integration with AWS. CNN Example Code (Gluon) ● Also has its own Tensorboard forks.

Choosing a Deep Learning Library MXNet Gluon

● Collaboration between AWS and . ● Provides a clear, concise, and simple API for deep learning. ○ Full set of plug-and-play neural network building blocks. ■ predefined layers, optimizers, and initializers ○ Built in model zoo. ● Hybridization is awesome ○ Hybrid Symbolic/Dynamic graph functionality. ○ Offers benefits of both. ○ Can make Gluon 3x faster than PyTorch ● Great documentation for absolute beginners.

Choosing a Deep Learning Library MXNet

● The non Python API’s are lacking in certain aspects. Apache, Amazon

○ The documentation can be weak. Watches: 1,180 | Stars: 16,450 ○ Stability issues at full production scale. Forks: 5,889 Avg Issue Resolution: 40 Days ● Community is growing, but is still small Open issues: 13%

○ Never the first library used for open source projects Symbolic/Dynamic Paradigm

Research Citations: 712

Model zoo

CNN Example Python Code

CNN Example Code (Gluon)

Choosing a Deep Learning Library CNTK Microsoft ● Microsoft Cognitive Tooklit was originally created by MSR Speech Watches: 1,388 | Stars: 15,850 researchers Forks: 4,217 Avg Issue Resolution: 28 Days ○ Now it has expanded to all types of deep learning applications. Open issues: 15% ● Used in Skype, Xbox, Cortana, anything “Azure” Symbolic/Dynamic Paradigm ● Focus on NLP with unbeatable RNN/LSTM performance Research Citations: 140 ● Supports distributed training like TensorFlow ● Only library with first class support for the Windows ecosystem. Model zoo ○ No support for OSX CNN Example Code

○ Simple Azure deployment CNN Example Code (Keras) ○ .NET language support

Choosing a Deep Learning Library CNTK Microsoft

● Average model zoo size/quality Watches: 1,388 | Stars: 15,850 Forks: 4,217 ● Good documentation consistent with other Microsoft products Avg Issue Resolution: 28 Days ● Non conventional open source license history. Open issues: 15%

● Small community Symbolic/Dynamic Paradigm

● Used the least in research Research Citations: 140

Model zoo

CNN Example Code

CNN Example Code (Keras)

Choosing a Deep Learning Library ONNX

● https://onnx.ai/ ● Open Neural Network Exchange Format ● Created in collaboration with AWS, Facebook and Microsoft ● Library and format for converting trained Neural Net models between libraries ● Provides a standardized onnx model format.

Choosing a Deep Learning Library Performance Comparisons Summary ● Benchmarks (oldish 2017) https://arxiv.org/pdf/1608.07249.pdf ○ Compares CNTK, Torch, Caffe, MXNet, Tensorflow ○ CPU’s to Multiple GPU performance on Synthetic/Real data across various deep learning architectures (CNN, FCN, RNN, LSTM...). ● Single GPU ○ Caffe, CNTK and Torch perform better than MXNet and TensorFlow on FCNs. ○ MXNet is outstanding in CNNs, especially the larger size of networks, while Caffe and CNTK also achieve good performance on smaller CNNs. ○ RNNs or LSTMs, CNTK obtains excellent time efficiency, which is up to 5-10x the rest.

Choosing a Deep Learning Library Performance Comparisons Summary ● Multiple GPUs ○ MXNet and Torch scale the best and TensorFlow scales the worst. ○ CNTK performs better scaling on FCNs specifically. ● Library specific optimizations ○ CNTK allows the trade off GPU memory for better computing efficiency. ○ MXNet can enable model auto-tuning using the cuDNN library. ● Overall the performance of TensorFlow is lacking compared to the other tools.

Choosing a Deep Learning Library Other Libraries to take note of... Theano

● University of Montreal ● Research Citations - 290 ● Development has ended, may it rest in peace ⚰ ● Makes you do a lot of things from scratch, which leads to more verbose code. ● Single GPU support ● Numerous open-source deep-libraries have been created and built on top of Theano, including Keras, Lasagne and Blocks ● CNN Example Code (Keras) or CNN Example Code (Lasagne) ● No real reason to use over TensorFlow unless you are working with old code.

Choosing a Deep Learning Library Caffe 2

● Facebook ● CNN Example Code ● Merged into the PyTorch codebase. ● Caffe2 targets supporting production applications with a focus on mobile. ● Caffe2 is built to excel at large scale deployments. ○ Caffe2 is built to utilizing both multiple GPUs on a single-host and multiple hosts with GPUs. ● Caffe2 improves Caffe in a series of directions: ○ first-class support for large-scale distributed training ○ mobile deployment ○ new hardware support (in addition to CPU and CUDA) ○ flexibility for future directions such as quantized computation ○ stress tested by the vast scale of Facebook applications

Choosing a Deep Learning Library Fast.ai

● fastai

Watches 555 Star 12,306 Forks 4,479 Median Issue Resolution 8 HOURS Open Issues 1%

● The library is based on research into deep learning best practices.

● Built on top of PyTorch ● Free, online, yearly updated courses in deep learning ○ Can even take it in person in SF ● Quickest at integrating new research examples ● Great for beginners getting into research.

Choosing a Deep Learning Library CoreML

● Apple ● Closed source ● Not a full DL library (you can not use it to train models at the moment), but mainly focused on deploying pretrained models to IOS and OSX devices ○ If you need to train your own model you will need to use one of the above libraries ○ Model converters available for Keras, Caffe, Scikit-learn, libSVM, XGBoost, MXNet, and TensorFlow

Choosing a Deep Learning Library Deep Learning Toolbox

● https://www.mathworks.com/products/deep-learning.html ● a MATLAB toolbox implementing CNNs and LSTMs. ● GPU support and cloud GPU on AWS with MATLAB Distributed Computing Server ● Create, edit, visualize, and analyze deep learning networks with interactive apps. ● Visualize network topologies, training progress, and activations of the learned features in a deep learning network. ● Import models from Caffe/Tensorflow-Keras/Onnx ● Not open source ○ $500 annual license ○ $1250 perpetual license

Choosing a Deep Learning Library

● Skymind

Watches 835 Star 10,431 Forks 4,602 Median Issue Resolution 6 days Open Issues 20%

● Keras Support (Python API) ● Written with Java and the JVM in mind ● Focus on enterprise scale ● Great Documentation ● DL4J takes advantage of the latest distributed computing frameworks including Hadoop and to accelerate training. On multi-GPUs, it is equal to Caffe in performance. ● Can import models from Tensorflow

Choosing a Deep Learning Library

● Preferred Networks ● Research Citations(2015) - 207

Watches 328 Star 4,626 Forks 1,228 Median Issue Resolution 44 days Open Issues 11%

● CNN Example Code ● Dynamic computation graph ● Used by IBM, Intel ● Japanese and English Community

Choosing a Deep Learning Library Darknet

● https://github.com/pjreddie/darknet

Watches 786 Star 11,980 Forks 6,770 Median Issue Resolution 26 days Open Issues 76%

● Very small open source effort with a laid back dev group. ○ Emojis and jokes everywhere. ○ Seems more of an exercise by the developers. ● Not useful for production environments. ● Maintainer wrote my favorite research paper.

Choosing a Deep Learning Library Sonnet

● DeepMind

Watches 475 Star 7,362 Forks 1,011 Median Issue Resolution 14 days Open Issues 14%

● Google DeepMind ○ One of the biggest name in industry research ○ AlphaGo, AlphaStar ● Built on Tensorflow, makes NN construction and training easy and extensible.

Choosing a Deep Learning Library Knet.jl

● https://github.com/denizyuret/Knet.jl

Watches 75 Star 833 Forks 149 Median Issue Resolution 9 days Open Issues 17%

● is the Koç University deep learning framework implemented in Julia ● supports GPU operation, automatic differentiation, and dynamic computational graphs ● Model code can use the full power and expressivity of Julia. ● CNN Example Code

Choosing a Deep Learning Library Paddle

● Baidu

Watches 649 Star 8,224 Forks 2,232 Median Issue Resolution 14 days Open Issues 18%

● PArallel Distributed Deep LEarning ● Chinese documentation with an English translation. ● originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu. ● Really only use if you are in the chinese market/ecosystem.

Choosing a Deep Learning Library ConvNetJS

● Stanford

Watches 645 Star 9,563 Forks 1,891 Median Issue Resolution 59 days Open Issues 69%

● Train Neural Networks entirely in your browser. ● Start training a net now! ● Great for visualizing the full network and training process. ● Mainly used for demonstrating and teaching deep learning on the web ○ See Stanford’s CS231n

Choosing a Deep Learning Library Neon

● Intel

Watches 366 Star 3,730 Forks 830 Median Issue Resolution 25 days Open Issues 17%

● Written with Intel Nervana MKL accelerated hardware in mind (Xeon and Phi processors)

● Intel's reference deep learning framework committed to best performance on all hardware.

● One of the fastest libraries

● One of the first half precision floating point enabled libraries.

Choosing a Deep Learning Library DyNet

● Carnegie Mellon University

Watches 200 Star 2,688 Forks 626 Median Issue Resolution 7 days Open Issues 12%

● Dynamic computation graph ● Small user community

Choosing a Deep Learning Library TLDR

● Choose TensorFlow or MXNet-Gluon for Industry/Production Environments ○ TensorFlow if you prioritize community support and documentation, MXNet if you need performance ● Pytorch if you are doing research/developing new models/layers. ● Keras if you are new and want to get started quick. ● Fast.ai + PyTorch if you are here to learn. ● CNTK if you ❤ Windows/Visual Studio/.NET or want to do high performance NLP ● CoreML for deploying things to Apple devices ● Deeplearning4j if you really like to keep things in the JVM.

Choosing a Deep Learning Library Choosing a Deep Learning Library