ONNX: the Open Neural Network Exchange Format
Wed, 04/25/2018 – 09:19
An open-source battle is being waged for the soul of artificial
intelligence. It is being fought by industry titans, universities and
communities of machine-learning researchers world-wide. This article
chronicles one small skirmish in that fight: a standardized file format
for neural networks. At stake is the open exchange of data among a
multitude of tools instead of competing monolithic frameworks.
The good news is that the battleground is Free and Open. None of the
big players are pushing closed-source solutions. Whether it is Keras and
Tensorflow backed by Google, MXNet by Apache endorsed by Amazon, or Caffe2
or PyTorch supported by Facebook, all solutions are open-source software.
Unfortunately, while these projects are open, they are not
interoperable. Each framework constitutes a complete stack that
until recently could not interface in any way with any other framework.
A new industry-backed standard, the Open Neural Network Exchange format,
could change that.
Now, imagine a world where you can train a neural network in Keras,
run the trained model through the NNVM optimizing compiler and
deploy it to production on MXNet. And imagine that is just one of
countless combinations of interoperable deep learning tools, including
visualizations, performance profilers and optimizers. Researchers and
DevOps no longer need to compromise on a single toolchain that provides
a mediocre modeling environment and so-so deployment performance.
What is required is a standardized format that can express any machine-learning model and store trained parameters and weights, readable and
writable by a suite of independently developed software.
Enter the Open Neural Network Exchange
To understand the drastic need for interoperability with a standard like
ONNX, we first must understand the ridiculous requirements we have for
existing monolithic frameworks.
A casual user of a deep learning framework may think of it as a language
for specifying a neural network. For example, I want 100 input neurons,
three fully connected layers each with 50 ReLU outputs, and a softmax on
the output. My framework of choice has a domain language to specify this
(like Caffe) or bindings to a language like Python with a clear API.
However, the specification of the network architecture is only the tip of
the iceberg. Once a network structure is defined, the framework still
has a great deal of complex work to do to make it run on your CPU or
Python, obviously, doesn’t run on a GPU. To make your network definition
run on a GPU, it needs to be compiled into code for the CUDA (NVIDIA) or
OpenCL (AMD and Intel) APIs or processed in an efficient way if running
on a CPU. This compilation is complex and why most frameworks don’t
support both NVIDIA and AMD GPU back ends.
The job is still not complete though. Your framework also has to balance
resource allocation and parallelism for the hardware you are using.
Are you running on a Titan X card with more than 3,000 compute cores, or a GTX
1060 with far less than half as many? Does your card have 16GB of RAM
or only 4? All of this affects how the computations must be optimized
And still it gets worse. Do you have a cluster of 50 multi-GPU machines
on which to train your network? Your framework needs to handle that too.
Network protocols, efficient allocation, parameter sharing—how much
can you ask of a single framework?
Now you say you want to deploy to production? You wish to
scale your cluster automatically? You want a solid language with secure APIs?
When you add it all up, it seems absolutely insane to ask one monolithic
project to handle all of those requirements. You cannot expect the
authors who write the perfect network definition language to be the same
authors who integrate deployment systems in Kubernetes or write optimal
The goal of ONNX is to break up the monolithic frameworks. Let an
ecosystem of contributors develop each of these components, glued together
by a common specification format.
The Ecosystem (and Politics)
Interoperability is a healthy sign of an open ecosystem. Unfortunately,
until recently, it did not exist for deep learning. Every framework
had its own format for storing computation graphs and trained models.
Late last year that started to change. The Open Neural Network Exchange
format initiative was launched by Facebook, Amazon and Microsoft,
with support from AMD, ARM, IBM, Intel, Huawei, NVIDIA and Qualcomm.
Let me rephrase that as everyone but Google. The format has been
included in most well known frameworks except Google’s
(for which a third-party converter exists).
This seems to be the classic scenario where the clear market leader,
Google, has little interest in upending its dominance for the sake
of openness. The smaller players are banding together to counter the
Google is committed to its own TensorFlow model and weight file format,
SavedModel, which shares much of the functionality of ONNX. Google is
building its own ecosystem around that format, including TensorFlow
Server, Estimator and Tensor2Tensor to name a few.
The ONNX Solution
Building a single file format that can express all of the capabilities of
all the deep learning frameworks is no trivial feat. How do you describe
convolutions or recurrent networks with memory? Attention mechanisms?
Dropout layers? What about embeddings and nearest neighbor algorithms
found in fastText or StarSpace?
ONNX cribs a note from TensorFlow and declares everything is a
graph of tensor operations. That statement alone is not sufficient,
however. Dozens, perhaps hundreds, of operations must be supported,
not all of which will be supported by all other tools and frameworks.
Some frameworks may also implement an operation differently from their
There has been considerable debate in the ONNX community about what level
tensor operations should be modeled at. Should ONNX be a mathematical
toolbox that can support arbitrary equations with primitives such as
sine and multiplication, or should it support higher-level constructs
like integrated GRU units or Layer Normalization as single monolithic
As it stands, ONNX currently defines about 100 operations. They range
in complexity from arithmetic addition to a complete Long Short-Term
Memory implementation. Not all tools support all operations, so just
because you can generate an ONNX file of your model does not mean it
will run anywhere.
Generation of an ONNX model file also can be awkward in some frameworks
because it relies on a rigid definition of the order of operations in a
graph structure. For example, PyTorch boasts a very pythonic imperative
experience when defining models. You can use Python logic to lay out
your model’s flow, but you do not define a rigid graph structure as in
other frameworks like TensorFlow. So there is no graph of operations
to save; you actually have to run the model and trace the operations.
The trace of operations is saved to the ONNX file.
It is early days for deep learning interoperability. Most users still
pick a framework and stick with it. And an increasing number of users
are going with TensorFlow. Google throws many resources and real-world
production experience at it—it is hard to resist.
All frameworks are strong in some areas and weak in others. Every new
framework must re-implement the full “stack” of functionality. Break up
the stack, and you can play to the strengths of individual tools. That will
lead to a healthier ecosystem.
ONNX is a step in the right direction.
Note: the ONNX GitHub page is here.
Braddock Gaskill is a research scientist with eBay Inc. He contributed
to this article in his personal capacity. The views expressed are his
own and do not necessarily represent the views of eBay Inc.
About the Author
Braddock Gaskill has 25 years of experience in AI and algorithmic
software development. He also co-founded the Internet-in-a-Box open-source project and developed the libre Humane Wikipedia Reader for
getting content to students in the developing world.
ONNX: the Open Neural Network Exchange Format
Source: Linux Journal