It’s an exciting day for Intel and the AI community we work to enable. Today, we’re proud to provide significant updates on the Intel AI portfolio: the first demonstrations of the Intel® Nervana™ Neural Network Processor for Training (NNP-T) and the Intel® Nervana™ Neural Network Processor for Inference (NNP-I). We will also be demonstrating for the first time enhanced integrated AI acceleration with bfloat16 on the next-generation Intel® Xeon® Scalable processor with Intel® Deep Learning Boost (Intel® DL Boost), codenamed Cooper Lake. Finally, we are announcing the future Intel® Movidius™ Vision Processing Unit (VPU), codenamed Keem Bay. This unique combination of hardware will enable the industry to embrace much larger and more complex AI algorithms, expanding what can be achieved with AI in the cloud and data center, an edge server, or an IoT device.
I am especially proud to present additional architectural details and live, working demos of real-world AI solutions that harness the power of Intel Nervana NNP-T and Intel Nervana NNP-I – products that many of my colleagues and I joined Intel to create. Performance of our pre-production hardware running pre-alpha software is already excelling, and we expect the forthcoming production platforms will perform even better.
Intel has a unique position and perspective on AI, with a comprehensive edge-to-cloud product portfolio that makes a wide breadth of AI solutions possible: from smart IoT edge devices to classic enterprise machine learning to next-generation deep learning for true AI super-users. This last group are developing the next generation of models that will move us from more basic intelligence to algorithms capable of using reasoning and context to make decisions and scale knowledge.
This next wave of AI requires huge increases in data and model complexity, some with trillions of potential parameters. Training these cutting-edge algorithms requires demand for AI compute to double about every 3.5 months , which cannot be accomplished efficiently with today’s architectures. These AI breakthroughs require new architectures that are specifically designed for high-speed, mass-scale AI compute.
Developed for the AI processing needs of leading-edge AI customers like Baidu, Intel Nervana NNP-T purpose-built deep learning architecture carefully balances compute, memory & interconnect near-linear scaling – up to 95% scaling with Resnet-50 & BERT as measured on 32 cards  – to train even the most complex models at high efficiency. As a highly energy-efficient compute platform for training real-world deep learning applications, Intel Nervana NNP-T ensures no loss in communications bandwidth when moving from an 8-card in-chassis system to a 32-card cross-chassis system, with the same data rate on 8 or 32 cards for large (128 MB) message sizes, scaling well beyond 32 cards .
For deep learning training models such as BERT-large, Transformer-LT with large weight sizes (> 500MB), Intel Nervana NNP-T systems with simplified glueless, peer-to-peer scaling fabric is projected to have no loss in bandwidth, scaling from a few cards to thousands of cards .
New AI services launch every day, driving demand for fast, efficient inference compute with a wide variety of use environments, energy constraints, and latency considerations. To serve these customers, the Intel Nervana NNP-I is designed for intense, near-real-time, high-volume, low-latency inference, as well as power and budget efficiency and flexible form factors. It is a performant, highly-programmable accelerator platform specifically designed for ultra-efficient multi-modal inferencing. Intel Nervana NNP-I will be supported by the OpenVINO™ Toolkit, incorporates a full software stack including popular deep learning frameworks, and offers a comprehensive set of reliability, availability, and serviceability (RAS) features to facilitate deployment into existing data centers.
We recently reported very positive MLPerf results for two pre-production Intel Nervana NNP-I processors on pre-alpha software, and we’re going to see even greater capabilities from Intel Nervana NNP-I as we further mature the AI software stack and update results on production products in the future.
As more organizations integrate AI capabilities into more facets of their operations, the Intel Xeon Scalable processor – which already powers most of the world’s inference today – will be called on to process increasingly complex algorithms To empower our customers to deliver more impactful AI applications, Intel introduced Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions in 2017 with the first generation Intel Xeon Scalable processor. With the 2nd Generation Intel Xeon Scalable processor, wwe introduced Intel DL Boost’s Vector Neural Network Instructions (VNNI), which combine three instruction sets into one while enabling INT8 deep learning inference.
At the AI Summit, we demonstrated how we’re improving on this foundation in our next-generation Intel Xeon Scalable processors with bfloat16, a new numerics format supported by Intel DL Boost. bfloat16 is advantageous in that it has similar accuracy to the more common FP32 format, but with a reduced memory footprint that can lead to significantly higher throughput for deep learning training and inference on a range of workloads.
Optimizations like these at the lowest level of silicon, which are unique to Intel Xeon Scalable processors, will help our customers continue to tackle even more computationally heavy problems with ease.
Compute at the network edge requires efficiency and scalability across a broad range of applications and with AI inference requirements come even tighter energy constraints – as low as just a few watts. To best support future edge AI use cases, we’re excited to announce the future Intel Movidius VPU (code-named Keem Bay), releasing in the first half of 2020. Keem Bay builds on the success of our popular Intel Movidius Myriad™ X VPU while adding groundbreaking and unique architectural features that provide a leap ahead in both efficiency and raw throughput.
Early performance testing indicates that Keem Bay will offer more than 4x the raw inference throughput of NVIDIA’s similar range TX2 SOC, at 1/3 less power, and nearly equivalent raw throughput to NVIDIA’s next higher class SOC, NVIDIA Xavier, at 1/5th the power . This is in part because of Keem Bay’s mere 72mm2 size vs NVIDIA Xavier’s 350mm , highlighting the efficiency that this new product’s architecture delivers. Keem Bay will also be supported by Intel’s OpenVINO Toolkit at launch and will be incorporated into Intel’s newly announced Dev Cloud for the Edge, which launches today and allows you to test your algorithms on any Intel hardware solution to try before you buy.
Intel’s solution portfolio uniquely integrates the compute architectures that analysts predict will be required to realize the full promise of AI: CPUs, FPGAs, ASICs like those we’re announcing today, all enabled by an open software ecosystem. We’re now realizing $3.5 billion per year in AI revenue, and we’re just getting started.
However, it will take a village; ushering in the 64X compute increase that Intel estimates the AI community will demand  in just 2 years can’t be done by compute alone; only Intel is equipped to look at the full picture of compute, memory, storage, interconnect, packaging and software to maximize efficiency, programmability, and ensure the critical ability to scale up distributing deep learning across thousands of nodes to, in turn, scale the knowledge revolution.
 Measurements based on Intel internal testing using pre-production hardware/software as of November 2019. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. For more information re. NNP performance and benchmarks, please see www.intel.ai/benchmarks.
 Measurements based on Intel internal testing and benchmarking using pre-production hardware/software as of November 2019, projections based on Intel internal system modeling. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. For more information re. NNP performance and benchmarks, please see www.intel.ai/benchmarks.
 All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. NNP-T Performance projections are based on measured data from pre-production NNP-T1000 silicon, using 22 TPCs at 900MHz core clock and 2GHz HBM clock, Host is an Intel® Xeon® Gold 6130T CPU @ 2.10GHz with 64 GB of system memory.
 Configurations: DL inference performance on ResNet-50 benchmark measured using INT8, batch size = 1, employing Keem Bay VPU’s native optimizations. ResNet-50 performance shown reflects low-level optimizations for max performance capability measured as of 31-Oct-2019, with pre-production silicon and tools. Measurement using single ResNet-50 network as standalone workload. Indicated max performance benchmark expected to change, and customer results may vary based on forthcoming tools releases. Power efficiency (inferences/sec/W) measured as of 31-Oct-2019 for Keem Bay SKU 3400VE. All performance and power efficiency measurements may be updated with further changes to software tools. Competitor performance shown is advertised peak performance for ResNet-50 (using INT8, Batch Size=1); power efficiency calculated as peak performance divided by power. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. See Reference Section for more configuration details.
Measurements based on Intel internal testing and benchmarking, projections based on Intel internal system modeling
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Intel, the Intel logo, Xeon, Nervana, Movidius, Movidius, and OpenVINO are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
Other names and brands may be claimed as the property of others.
© Intel Corporation