2nd Generation Intel® Xeon® Scalable CPUs Outperform NVIDIA GPUs on NCF Deep Learning Inference

Recommendation systems are some of the most complex and prevalent commercial AI applications deployed by internet companies today. One of the biggest challenges in using these systems is collaborative filtering – making predictions about the interests of one person based on the tastes and preferences of similar users. A novel model called Neural Collaborative Filtering (NCF) leverages deep learning to learn user-item interaction for better recommendation performance, and the MLPerf organization uses NCF as a key benchmark.

Through hardware advances, software tool development, and framework optimizations, in recent years Intel has achieved tremendous deep learning performance improvement on CPUs. Thanks to the Intel Deep Learning Boost (Intel DL Boost) feature found in 2nd generation Intel Xeon Scalable processors, we demonstrated leadership NCF model inference performance of 64 million sentences per second under 1.22 milliseconds (msec) on a dual socket Intel Xeon Platinum 9282 Processor-based system, outperforming the GPU performance on NCF published by NVIDIA on Jan 16th, 2020. [1]

Model Platform Performance Precision Dataset
NCF Intel Xeon Platinum 9282 CPU Throughputs: 64.54 million requests/sec
Latency: 1.22msec
INT8 MovieLens 20 Million
NVIDIA V100 Tensor Core GPU Throughputs: 61.94 million requests/sec
Latency: 20msec
Mixed MovieLens 20 Million
NVIDIA T4 Tensor Core GPU Throughputs: 55.34 million requests/sec
Latency: 1.8msec
INT8 Synthetic

Figure 1: 2nd Gen Intel Xeon Scalable Processor performance on NCF model compared to NVIDIA GPUs. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Refer to http://software.intel.com/en-us/articles/optimization-notice for more information regarding performance and optimization choices in Intel software products.

Reproducible Instructions:

Step 1: Install Intel Math Kernel Library (Intel MKL) through YUM or APT Repository.

Step 2: Build MXNet with Intel MKL and activate runtime environment.

Intel Deep Neural Network Library (Intel DNNL), formerly known as Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN) is enabled by default. See the article, Install MXNet with Intel MKL-DNN, for details.

  
git clone https://github.com/apache/incubator-mxnet
cd ./incubator-mxnet && git checkout dfa3d07
git submodule update --init --recursive
make -j USE_BLAS=mkl USE_INTEL_PATH=/opt/intel
source /opt/intel/bin/compilervars.sh intel64
export PYTHONPATH=/workspace/incubator-mxnet/python/
  

Step 3: Launch NCF (see README for details). You can do a quick benchmark with the following command:

  
# go to NCF dir
cd /workspace/incubator-mxnet/example/neural_collaborative_filtering/
# install some python libraries
pip install numpy pandas scipy tqdm
# prepare ml-20m dataset
python convert.py
# download pre-trained models
# optimize model
python model_optimizer.py
# calibration on ml-20m dataset
python ncf.py --prefix=./model/ml-20m/neumf-opt --calibration
# benchmark
bash benchmark.sh -p model/ml-20m/neumf-opt-quantized    
  

Summary

As shown above, Intel Xeon Scalable processors are highly effective for NCF model inference. Next, we will extend the acceleration to broader recommender system models like DLRM, and illustrate the training efficiency with mixed precision from both single precision (float 32) and half precision (bfloat16) with new extensions being added to Intel DL Boost in next-gen Intel Xeon Scalable processors, due out later this year.

Notices and Disclaimers