What a difference a few years make! We have seen tremendous shifts in the field of deep learning as it moves from model training to inference deployment within main lines of business across industries. Customers are asking us different questions now: how do I scale in the real world, quickly and cost-effectively, to stay competitive? How do I run AI applications in products with strict latency and lightning-fast requirements for inference results? The answer has three key pieces: get more from the architecture you already know, accelerate with purpose, and use software to simplify the environment. Before we look at these key elements of real-world deployments, let’s examine how we got here.
As deep learning (DL) has matured, CPU solutions turned a page by achieving a many-fold boost to deliver optimized AI. Today, even older CPUs can deliver performance many times better than once thought possible, let alone new generations of enhanced CPUs.
Now that we’ve examined these shifts, let’s revisit the question: How do I scale performant AI applications efficiently, while on a budget? The answer is clear and manageable:
Many companies, representing a diverse range of applications, markets, data, and audiences, are using this three-part approach to deploy real-world AI today. Some are long-time users of Intel Xeon processors. Others are taking advantage of the new 2nd Gen Intel Xeon Scalable processors. With their hardware and software optimizations targeting AI workloads, these CPUs deliver up to 14X inference throughput over the previous generation.
Here are a few customers seeing great success deploying AI on Intel:
The reason these collaborations and many others were successful is because the companies and academic institutes were able to meet performance demands and extend their existing solutions with AI capabilities, while minimizing the cost of change. Another recurring theme was the flexibility to quickly adapt to new usages and opportunities.
Intel® Xeon® Scalable processors are relied upon for so many other enterprise workloads, leveraging them for AI comes at minimal extra cost. Yet as AI matures, the path to the future calls for decisions about when further acceleration is needed for intensive, continuous, high-volume tensor compute. Custom ASICS work hand-in-hand with Xeon-based infrastructure to offload and accelerate the intensive deep learning tensor-based parts of the application leaving the rest to benefit from the host CPU.
Customers like Facebook, whose deep learning demands grow more intensive and sustained, are looking to augment their current CPU-based inference with this new class of accelerators that offer very high concurrency of large numbers of compute elements (spatial architectures), fast data access, high-speed memory close to the compute, high-speed interconnect, and multi-node scaled solutions.
For this reason, Facebook has been a close collaborator with us on the Intel® Nervana™ Neural Network Processor-i 1000 (codenamed Spring Hill) in production later this year. As a leading community platform that unites nearly half the world, Facebook relies on driving and helping build substantial advancements in AI, including this new generation of power-optimized, highly-tuned AI inference chips that we expect to be a leap forward in inference application acceleration, delivering industry leading performance per watt on real production workloads. The Intel Nervana NNP-I 1000 will be fully integrated with Facebook’s Glow compiler to help keep their software environment simple and highly optimized.
Many companies have just started or are about to start their deep learning inference deployment at scale. They are in a variety of industries but share a similar overarching multi-faceted question — how to add high-impact DL functionality to line of business applications, at required performance, with managed cost and change, while maintaining flexibility for future needs? Here, I offer a basic strategy for deployment in the data center and cloud:
The AI landscape is shifting – constantly and quickly. What you couldn’t do three years ago, you can do now. It’s an exciting time to witness the impact of enterprise-scale inference deployments, and advancements in both hardware and software from devices to data centers. I can’t wait to see what the next three years brings!
 Up to 14X AI Performance Improvement with Intel® Deep Learning Boost (Intel DL Boost) compared to Intel® Xeon® Platinum 8180 processor (July 2017). Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x200004d), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, nvme1n1 INTEL SSDPE2KX040T7 SSD 3.7TB, Deep Learning Framework: Intel® Optimization for Caffe* version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a), ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, DummyData, 4 instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 cpu @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS* Linux release 7.3.1611 (Core), Linux kernel* 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY=’granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (https://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time –forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50), Intel® C++ Compiler ver. 17.0.2 20170213, Intel® Math Kernel Library (Intel® MKL) small libraries version 2018.0.20170425. Caffe run with “numactl -l“.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © Intel Corporation