Interoperating Deep Learning Models with ONNX.Jl

Interoperating Deep Learning models with ONNX.jl Ayush Shridhar1, Phil Tomson2, and Mike Innes3 1International Institute of Information Technology, Bhubaneswar, India 2Unaffiliated 3Julia Computing, Inc ABSTRACT network, with complete support for control flow, recursion, clo- sures and data structures. Implementing models in Flux.jl is as sim- Flux [17] is a machine learning framework, written using the nu- ple as writing regular Julia code. Implementing models is as simple merical computing language Julia[4]. The framework makes writ- as writing the formulae for those, and Zygote.jl will compute the ing layers as simple as writing mathematical formulae, and it’s ad- derivatives seamlessly. vanced AD, Zygote [11] , applies automatic differentiation (AD) to Flux.jl also provides support for other hardware options using ex- calculate derivatives and train the model. It makes heavy use of Ju- ternal packages such as CuArrays.jl and CLArrays.jl. CuArrays lia’s language and compiler features to carry out code analysis and is written completely in Julia, making implementing GPU kernels make optimisations. For example, Julia’s GPU compilation support very simple. Making a model run on GPU can be done in a hassle- [3] can be used to JIT-compile custom GPU kernels for model lay- free manner: It is as simple as calling a few functions to trans- ers [19]. Flux also supports a number of a hardware options, from fer data to GPU. Flux.jl also has support for running models on CPUs, GPUs and even TPUs via XLA.jl, that compiles Julia code Google’s Tensor Processing Unit (TPU). TPUs help in very fast to XLA: an advanced compiler for linear algebra that is capable of linear algebra computation. Running Flux models on TPUs is pos- greatly optimizing speed and memory usage in large deep learning sible through XLA.jl that compiles Julia code to XLA. models. The FluxML ecosystem provides a number of supporting packages ONNX.jl is an Open Neural Network Exchange backend for the that provide additional functionalities , some of them being (apart Flux.jl deep learning framework. ONNX.jl supports directly im- from the aforementioned Flux.jl, Zygote.jl and XLA.jl); porting high quality ONNX standard models into Flux, thus saving time and reducing the need for additional computation resources. —ONNX.jl : Open Neural Network eXchange backend for Flux.jl This paper aims at introducing ONNX.jl and explaining how it fits —Metalhead.jl [12]: Simple plug and play pretrained Flux.jl com- into the bigger picture: How we can use the Julia Language, specif- puter vision models. ically Flux.jl and ONNX.jl as a starting for high quality transfer —Torch.jl [7]: This package aims at exposing Torch.tensor types in learning of large deep learning models. Julia. —IRTools.jl [13] : Provides an IR format that is easy to manipulate. —FluxJS.jl [18] : Runs Flux models in the browser, via tensor- Keywords flow.js Julia, Machine Learning, Deep Learning, Transfer Learning —model-zoo [15]: Collection of implementation of various Flux deep learning models. 1. Introduction 2. Open Neural Network eXchange (ONNX) The Julia language was introduced to solve the two language problem: In simple words, languages that are simple to write (high- Open Neural Network Exchange (ONNX) [2] is an open ecosys- level) are very slow but those which are difficult to use (low-level) tem that empowers AI developers to choose the right tools as are way faster. This is because most of the high-level languages their project evolves. ONNX provides an open source format for weren’t written to process a large amount of data. Thus engineers, AI models, both deep learning and traditional machine learning. researchers and developers have a hard time developing a lot of ONNX defines the computation graph for a deep learning model high performance languages. At the moment, the common proto- along with various operators used in the model. It provides a set col is to write the core of the software in a low-level language of specifications to convert a model to a basic ONNX format, and (C/C++/Fortran) and wrap it in a high-level language (Python). another set of specifications to get the model back from this ONNX This results in optimized performance and ease of use. The Julia form. At a high level, ONNX is designed to allow framework inter- language aims to make best of both worlds. It provides a high level operability. There are many excellent machine learning libraries in syntax but manages to perform as fast as C (sometimes even faster). various languages : PyTorch [2] , TensorFlow [1] , MXNet [5], Flux.jl is a library for implementing machine learning models, writ- and Caffe [20] are just a few that have become very popular in ten completely in the Julia programming language. At the heart of recent years, but there are many others as well. Machine learning Flux.jl lies Zygote.jl: A source-to-source automatic differentiation models can be converted to a serialized ONNX format which can (AD) library that makes complete use of the Julia language com- then be run on a number devices. ONNX Runtime is an inference piler to generate backward pass during training phase of a neural engine written in C++ framework used to deploy ONNX format 1 Proceedings of JuliaCon 1(1), 2019 models into production. It works on diverse hardware and support 3.1 ProtoBuf.jl both deep learning as well as traditional machine learning models. ProtoBuf.jl [26] is a Julia implementation of Protocol Buffers. It provides a way to read and write data to and from Julia types from 2.1 Where does ONNX come in? I/O streams. What ProtoBuf.jl gives from onnx.proto3 is the Julian ONNX is a format for representing deep learning models, which definition of various data structures that in themselves have all the can be further run on numerous devices without worrying much required attributes to load any ONNX serialized model. As an ex- about the implementation. This helps researchers, developers and ample, as simple message defined in onnx.proto3 as: engineer to focus on the problem in hand without worrying much about the peripherals, such as the framework to use, the ability to message NodeProto{ run a model trained using this particular framework on specialized § repeated string input=1;// namespace Value ¤ hardware. ONNX is is usable anywhere from small mobile devices repeated string output=2;// namespace Value to large server farms, across chipsets and vendors, and with exten- // An optional identifier for this node ina graph. sive runtimes and tools support. ONNX reduces the friction of mov- // This fieldMAY be absent in ths version of the ing trained AI models among your favorite tools and frameworks //IR. and platforms. A simple example of how ONNX is ideal for ML string name=3;// namespace Node is the case when large deep learning models need to be deployed. // The symbolic identifier of the Operator to Consider the simple case of deploying a Deep Learning model to // execute. an iOS application. This particular model can be implemented in string op_type=4;// namespace Operator any framework : TensorFlow, PyTorch, MXNet just to name a few. string domain=7;// namespace Domain However, iOS applications expect to use CoreML inside the appli- // Additional named attributes. cation. Up until now, developers have been porting large models repeated AttributeProto attribute=5; to different frameworks, which is a waste of time and energy, bet- ter spent somewhere else. This is also retraining the entire model string doc_string=6; from scratch, which isn’t efficient. This makes the entire process } cumbersome and impractical. ONNX exists to solve this very problem : By connecting the common dots from different frameworks, ¦(Code snippet taken from onnx/onnx.proto3) ¥ ONNX makes it possible to express a model of type A to type B, Results in the corresponding Julia definition of the model as : thus saving time and the need to train the model again. mutable struct NodeProto<: ProtoType 3. ONNX backend in Julia § input::Vector{AbstractString} ¤ output::Vector{AbstractString} ONNX.jl is an ONNX backend written in Julia for the Flux.jl ma- name::AbstractString chine learning framework. It can be used for high-quality inference op_type::AbstractString domain::AbstractString of pretrained ONNX format machine learning models. At the heart at tri bu te::Vector{AttributeProto} of it, ONNX.jl solves a compiler problem by dealing with inter- d o c _ s t r i n g::AbstractString mediate code representations to generate readable graphs. While No deP ro to(; kwargs...)= doing this, ONNX operators are mapped to corresponding Flux lay- (o=new(); fillunset(o); isempty(kwargs)|| ProtoBuf._protobuild(o, kwargs);o) ers, thus tracing out the model’s computation graph at the end. This end graph can then be travelled to generate the Julia code for the model. The first step towards reading any ONNX model in Julia is to have access to all the data structures in the model, that essentially hold ¦Since ONNX tries to inherit properties from diverse frameworks, ¥ all the required information to load the model. This information ONNX serialized models can be large and complicated. While there includes everything that is used to completely define the model: are a number of complex generated data structures, three of those Hyper-parameters and parameters in the model. So in the case of a are essential towards understanding how data is stored internally: simple Convolutional Neural Network, this may contain information such as the number of layers, number of units in each layer, —ModelProto: A very high-level struct that holds all the informa- strides, padding, kernel size, number of filters as hyper-parameters tion.

Interoperating Deep Learning Models with ONNX.Jl

Intro to Tensorflow 2.0 MBL, August 2019

Ways to Use Machine Learning Approaches for Software Development

Document Databases, JSON, Mongodb 17

Smart Grid Serialization Comparison

Keras2c: a Library for Converting Keras Neural Networks to Real-Time Compatible C

Spindle Documentation Release 2.0.0

Automated Elastic Pipelining for Distributed Training of Transformers

Rdf Repository Replacing Relational Database

Fault Tolerance and Re-Training Analysis on Neural Networks

Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction

In-Datacenter Performance Analysis of a Tensor Processing Unit

Abstractions for Programming Graphics Processors in High-Level Programming Languages