Purpose-Built Inference Acceleration Takes AI Everywhere

Inference is where the transformative power of AI meets real-world applications. It’s how a smart speaker recognizes a plethora of different human voices, and how software can examine medical images and find out which indicate cancer, and which don’t. Inference is how retailers can determine which shelves are out of stock or which products customers put back, and how specially trained cameras  poachers to protect animals. Training is the “learning” part of AI, and inference is the “doing.” It’s how AI makes decisions and demonstrates value.

Bringing about an “AI everywhere”-future means accelerating inference no matter where it’s deployed, from smart devices to personal computers, to data centers and public clouds. In an era where raw data is growing exponentially, the ability to turn it into precious knowledge no matter where it lives requires a variety of hardware, including new purpose-built AI chips.

For example, many AI inference applications run well on general purpose CPUs, especially 2nd Generation Intel® Xeon® Scalable processors with new inference acceleration instructions built in. However, high-volume or specialized applications will benefit greatly from purpose-built accelerators designed specifically for inference. The Intel® Nervana™ Neural Network Processor for Inference (Intel® Nervana™ NNP-I) is architected for the most demanding and intense inference workloads. Designed from the ground up, this dedicated accelerator offers a high degree of programmability without compromising performance-to-power efficiency and is built to satisfy the needs of enterprise-scale AI deployments.

Digging into the architecture itself reveals why the Intel Nervana NNP-I is so efficient, and why it’s capable of tackling AI inference at scale.

Inside the Intel Nervana NNP-I

The Intel Nervana NNP-I is a massively parallel chip that keeps data close to the processing points and maximizes data reuse as much as possible. At the heart of the chip are 12 all-new inference compute engines (ICE), each optimized for throughput and latency. Pre- and post-processing operations are fused, ensuring that flexibility and efficiency are worked into the very structure of the accelerator. Data doesn’t move more than it has to.

In addition, two Intel® architecture (IA) compute cores provide extreme programmability including features like Advanced Vector Extensions (AVX) and Vector Neural Network Instructions (VNNI). Likewise, the up 24 MB LLC of memory is shared between the ICE and CPU cores, and is optimized to keep data as close to the compute centers as possible.

Figure 1: The Intel® NervanaTM Neural Network Processor for Inference (Intel® NervanaTM NNP-I) architecture diagram

Figure 1: The Intel® Nervana™ Neural Network Processor for Inference (Intel® Nervana™ NNP-I) architecture diagram

Inference needs are specific and often extraordinarily specialized. The Intel Nervana NNP-I’s Tensilica Vision P6 DSP vector processing engine is programmable and allows for a full bi-directional pipeline with the deep learning compute grid.

The comprehensive Intel Nervana NNP-I software stack heavily leverages open source components with deep learning framework integration and support for ONNX and nGraph Library compilers, plus the Intel® Distribution of OpenVINO™ toolkit, and C++. AI practitioners have the flexibility to optimize, reconfigure, and implement on the Intel Nervana NNP-I in the ways most suited for their inference needs, entering the stack at the kernel, compiler, or framework level.

Programmability and Performance

Efficient deep learning is about solving data delivery problems, and the Intel Nervana NNP-I’s programmability, flexibility, and performance capabilities are all built to move data rapidly. The purpose-built architecture is able to deliver greater performance with less power, giving the Intel Nervana NNP-I an important edge in overall cost of operation and ownership. Better results need not coincide with larger power bills, hotter server rooms, or strained capabilities on technological infrastructure.

AI Everywhere Means Inference Everywhere

As AI continues to proliferate in real-world deployments, inference performance will become more critical to delivering enterprise insights and results. The Intel Nervana NNP-I is built to accelerate those transformative applications, with an eye on ease-of-use and power consumption. For more information, join us at the AI Hardware Summit in Mountain View, CA from September 17-18 and visit the Intel Nervana NNP product page for the latest updates.

Notices and Disclaimers