The Importance of Microsoft’s Deep Learning “Rosetta Stone”

March 19, 2018 | By James Kobielus |

AI, Analysis, Big Data, Featured

Deep learning (DL) developers usually do their work within particular modeling frameworks, such as TensorFlow, MXNet, and Caffe2.

As enterprises proliferate DL initiatives, the range of modeling frameworks in use is likely to expand. That trend may crimp developer productivity if launching into a new DL project requires cross-training on a different framework. Recognizing this, more DL vendors are offering framework-agnostic development platforms. This, in turn, is accelerating the development of industry-standard DL abstraction layers.

As discussed in this Wikibon Premium research note several months ago, the principal abstraction layers and aligned industry initiatives include the following:

Higher-level DL APIs: Keras–which runs on top of the TensorFlow, CNTK, Theano, and DeepLearning4Jback-ends–provides a high-level Python API for fast prototyping and programming of DL models. In addition, the recently announced Gluon framework, developed by AWS and Microsoft, defines a Python API for simplified DL programming on MXNet and potentially any DL back-end.
Shared DL-model representations: The Keras API enables cross-tool sharing of computational graphs that were defined in that framework. Open Neural Network Exchange (ONNX) provides a shared model representation that will be supported in CNTK, Caffe2, and PyTorch.
Cross-platform DL model compilers: Already, we see several industry initiatives that enable a DL model created in one front-end tool to be automatically compiled for efficient execution on heterogeneous back-end platforms and chipsets. AWS has announced NNVM Compiler, Intel has NGraph, Google has XLA, and NVIDIA has TensorRT 3.
Heterogeneous DL-microservice decouplers: As DL becomes yet another category of functions deployed as distributed microservices, we’ll see more tool-agnostic initiatives that enable decoupled multi-node interoperabilityof apps built in heterogeneous DL tools. In that regard, IBM Research’s Distributed Deep Learning (DDL) software library provides an API that enables TensorFlow, Caffe, Torch, and Chainer developers to scale-out DL model execution across PowerAI.

Last week, Microsoft announced another important initiative that will drive the creation of open DL modeling frameworks. As discussed in this Microsoft Research blog, the company has open-sourced a GitHub repo that creates what they call a “Rosetta Stone of deep-learning frameworks” that facilitates cross-framework benchmarking of GPU-optimized DL models. GPU-accelerated DL is a core requirement for this new era of intelligent apps, as discussed in this Cube panel.

The repo includes optimized modeling code that is accessible through up-to-date high-level APIs (Keras and Gluon) supported in various frameworks. For alternative multi-GPU configurations, it publishes benchmarks for performance comparisons of these models–specifically, training-time results for CNN and RNN models performing ResNet50 image recognitions on CIFAR-10 datasets and for RNN models doing sentiment analysis on IMDB movie reviews. These benchmarks compare training-time performance for these DL models across frameworks (Tensorflow, MXNet, CNTK, Caffe2, PyTorch, Chainer, Theano, and Knet) and across languages (R and Julia).

Just as important as the models and benchmarks that Microsoft publishes is its invitation to any data scientist to spin up an Azure Deep Learning Virtual Machine and contribute their own benchmarks for any DL task, framework, API, language, and GPU configuration they wish.

It’s good that the “Rosetta Stone” authors have pointed out their benchmarks’ limitations. They are not intended to characterize the overall performance of any specific DL framework considering that they gloss over critical factors such as the performance advantages of particular hardware platforms, data loaders, and the like and the speedups that may accrue from distributing various training functions. Likewise, they did not address the performance advantages or disadvantages that may stem from using a tool such as MMdnn directly convert any particular model from one framework to another.

From all of this, it’s clear that a key abstraction layer of the DL ecosystem needs to be focused on higher-level performance benchmarking during the upfront modeling. This will ensure that the DL model is generated on the front-end is automatically optimized for the back-end to which it’s ultimately deployed.