Inference is where the transformative power of AI meets real-world applications. It’s how a smart speaker recognizes a plethora of different human voices, and how software can examine medical images and find out which indicate cancer, and which don’t. Inference is how retailers can determine which shelves are out of stock or which products customers put back, and how specially trained cameras poachers to protect animals. Training is the “learning” part of AI, and inference is the “doing.” It’s how AI makes decisions and demonstrates value.
Bringing about an “AI everywhere”-future means accelerating inference no matter where it’s deployed, from smart devices to personal computers, to data centers and public clouds. In an era where raw data is growing exponentially, the ability to turn it into precious knowledge no matter where it lives requires a variety of hardware, including new purpose-built AI chips.
For example, many AI inference applications run well on general purpose CPUs, especially 2nd Generation Intel® Xeon® Scalable processors with new inference acceleration instructions built in. However, high-volume or specialized applications will benefit greatly from purpose-built accelerators designed specifically for inference. The Intel® Nervana™ Neural Network Processor for Inference (Intel® Nervana™ NNP-I) is architected for the most demanding and intense inference workloads. Designed from the ground up, this dedicated accelerator offers a high degree of programmability without compromising performance-to-power efficiency and is built to satisfy the needs of enterprise-scale AI deployments.
Digging into the architecture itself reveals why the Intel Nervana NNP-I is so efficient, and why it’s capable of tackling AI inference at scale.
The Intel Nervana NNP-I is a massively parallel chip that keeps data close to the processing points and maximizes data reuse as much as possible. At the heart of the chip are 12 all-new inference compute engines (ICE), each optimized for throughput and latency. Pre- and post-processing operations are fused, ensuring that flexibility and efficiency are worked into the very structure of the accelerator. Data doesn’t move more than it has to.
In addition, two Intel® architecture (IA) compute cores provide extreme programmability including features like Advanced Vector Extensions (AVX) and Vector Neural Network Instructions (VNNI). Likewise, the up 24 MB LLC of memory is shared between the ICE and CPU cores, and is optimized to keep data as close to the compute centers as possible.
Inference needs are specific and often extraordinarily specialized. The Intel Nervana NNP-I’s Tensilica Vision P6 DSP vector processing engine is programmable and allows for a full bi-directional pipeline with the deep learning compute grid.
The comprehensive Intel Nervana NNP-I software stack heavily leverages open source components with deep learning framework integration and support for ONNX and nGraph Library compilers, plus the Intel® Distribution of OpenVINO™ toolkit, and C++. AI practitioners have the flexibility to optimize, reconfigure, and implement on the Intel Nervana NNP-I in the ways most suited for their inference needs, entering the stack at the kernel, compiler, or framework level.
Efficient deep learning is about solving data delivery problems, and the Intel Nervana NNP-I’s programmability, flexibility, and performance capabilities are all built to move data rapidly. The purpose-built architecture is able to deliver greater performance with less power, giving the Intel Nervana NNP-I an important edge in overall cost of operation and ownership. Better results need not coincide with larger power bills, hotter server rooms, or strained capabilities on technological infrastructure.
As AI continues to proliferate in real-world deployments, inference performance will become more critical to delivering enterprise insights and results. The Intel Nervana NNP-I is built to accelerate those transformative applications, with an eye on ease-of-use and power consumption. For more information, join us at the AI Hardware Summit in Mountain View, CA from September 17-18 and visit the Intel Nervana NNP product page for the latest updates.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com.
Intel, the Intel logo, and Nervana are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © Intel Corporation.