Unlocking DL Performance with nGraph

The rapid growth of deep learning (DL) in large-scale, real-world applications has produced a sharp increase in the demand for high-performance training and inference solutions. This demand is reflected in the growing investment in DL performance by major hardware manufacturers, including a proliferation of application-specific accelerators.

nGraph

Of course, performance isn’t driven by hardware alone. In the software realm, new deep learning compilers are helping to maximize the performance and scalability of deep learning systems while increasing the productivity of AI developers.

At this week’s Artificial Intelligence Conference, presented by O’Reilly and Intel and held in New York City, we’ll lead a session that provides a comprehensive overview of nGraph, an open source deep learning compiler, library and runtime suite. The session will take a deep dive into the design of nGraph, including the compiler’s intermediate representation, optimization pipelines, runtime interface, and framework integration. We’ll also discuss the rationale for the development of nGraph and show the benefits it offers to developers creating DL applications for the enterprise, as well as to AI service providers and hardware vendors that want to deliver flexible, performant DL technologies to their users and customers.

nGraph supports a range of deep learning frameworks (TensorFlow*, PyTorch*, MXNet*, etc) and hardware back-ends (CPUs, GPUs, specialized accelerators) and delivers dramatic performance improvements on a range of platforms. Figure 1 depicts benchmark results that show how nGraph can deliver as much as a 45x increase in normalized inference throughput leveraging MKL-DNN on Intel® Xeon® Scalable processors. [1]

Figure 1. Normalized inference throughput (images per second with a batch size of 1) across several models comparing stock Apache* MXNet* (not based on MKL-DNN) (blue) and with nGraph-compiled MXNet (orange) leveraging MKL-DNN on the same Intel® Xeon® Scalable processor-based backend.

Figure 1. Normalized inference throughput (images per second with a batch size of 1) across several models comparing stock Apache* MXNet* (not based on MKL-DNN) (blue) and with nGraph-compiled MXNet (orange) leveraging MKL-DNN on the same Intel® Xeon® Scalable processor-based backend. [1]

Beyond Kernel Libraries

Increasing levels of deep learning performance are crucial to keeping pace with the rapid expansion in models and data set sizes. The state-of-the-art software approach to DL acceleration has been to integrate high performance kernel libraries such as Intel® Math Kernel Library for Distributed Neural Networks (Intel® MKL-DNN) into deep learning frameworks.

Kernel libraries offer runtime performance on specific hardware targets though highly optimized kernels and operator-level optimizations. But kernel libraries, in which an interpreter orchestrates the invocation of per-operation compute kernels, can be hard-pressed to handle the rising complexity of today’s changing industry requirements. Here are three major reasons:

  1. Kernel libraries do not support graph-level optimization. Although each operation may be optimal, the graph itself may not be, resulting in inefficient execution of duplicate operations.
  2. With the growing diversity in DL hardware, the number of distinct kernels that must be written is becoming untenable. For optimal performance, each kernel must be modified for the targeted chip design, data types, operations, and parameters. This creates a large burden that must be revisited each time the infrastructure is upgraded or the DL solution is applied to a different workload.
  3. Framework integration of kernel libraries does not scale. Each deep learning framework must be integrated individually with a given kernel library, and each integration is unique to the framework and its set of deep learning operators, its view on memory layout, its feature set, and so forth.

nGraph uses high performance kernel libraries such as MKL-DNN to enable the best performance of the underlying hardware. In addition, nGraph provides graph-level optimizations that can be shared across multiple frameworks and target hardware platforms. nGraph also offers a universal way of interacting with deep learning frameworks through an intermediate representation of the deep learning graph.

By combining intermediate graph representations with the open source tensor compiler PlaidML* and Microsoft’s open source Open Neural Network Exchange* (ONNX*), nGraph delivers performance portability across a wide range of DL frameworks and a variety of CPU, GPU, and other accelerator processor architectures. (See Figure 2.) Developers and solution providers can enhance productivity by using a single API to produce efficient, highly performant DL solutions. This approach also preserves the flexibility to keep pace with changing workload requirements and new frameworks and platforms. Hardware platforms that don’t have a kernel library or don’t have the exact kernel they need, can use PlaidML to help simplify the work of creating new kernels. The PlaidML framework generates efficient kernels automatically from polyhedral tensor expressions, transforming graph-level operations requested by nGraph into optimized device-specific implementations.

Figure 2. nGraph ecosystem. nGraph has support for TensorFlow*, MXNet* directly through nGraph and PlaidML. PyTorch*, Chainer*, and other frameworks are supported indirectly through ONNX*.

Figure 2. nGraph ecosystem. nGraph has support for TensorFlow*, MXNet* directly through nGraph and PlaidML. PyTorch*, Chainer*, and other frameworks are supported indirectly through ONNX*.

Go Further

We encourage you to attend our session today at 11:05 a.m. EDT and stop by our booth to learn more. Whether you attend the AI Conference or not, we hope you’ll visit the GitHub repository, download nGraph, try it out, and add your contributions. The current, beta version of nGraph is open source and works out of the box with TensorFlow*, MXNet*, and ONNX, with planned support for PaddlePaddle*, so please clone the repo and get started! Watch for the Gold Release 1.0 version of nGraph coming this year, and follow us @IntelAI for the latest happenings at #TheAIConf in NYC.

Additional Resources:

Notices and Disclaimers: