TensorFlow* Containers Optimized for Intel
It has never been easier to see the power of Intel Xeon Scalable processors for deep learning. Three recent developments make it faster than ever to get up and running with optimized inference workloads on Intel platforms:
- Amazon Web Services launched three instances — c5.12xlarge, c5.24xlarge, and C5.metal — based on 2nd generation Intel® Xeon® Scalable processors with Intel® Deep Learning Boost (Intel® DL Boost). This new feature delivers significant performance on inference workloads thanks to a special instruction set, Vector Neural Network Instructions (VNNI), that accelerates low-precision inference.
- Google released Deep Learning Containers for CPUs optimized with Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), an open source, performance-enhancing library for accelerating deep learning frameworks on Intel® architecture.
- Intel released the Model Zoo for Intel Architecture, now on version 1.4, an open-source collection of machine learning inference applications that helps users get optimized performance on Intel platforms.
This blog walks through an example showcasing this powerful combination of tools. All you need in order to replicate it is an AWS account and about 10 minutes.
ResNet50 INT8 Inference on 2nd Gen Intel Xeon Scalable Processors
There are two sections in this example.
- We will outline the steps needed to set up a 2nd Gen Intel Xeon Scalable processor-based AWS instance and verify that VNNI is enabled.
- We will show how to run ResNet50 inference by implementing the Model Zoo’s pretrained INT8 model and Google’s Deep Learning container to perform low-precision inference accelerated by Intel MKL-DNN and VNNI.
Setup an AWS Instance
- Log into AWS console and launch an instance
- Choose your desired AMI. In this demo, we will launch “Ubuntu server 18.04”.
Alternatively you can also use a “DL AMI” that comes with CPU-optimized frameworks preinstalled.
- Choose either c5.12xlarge or c5.24xlarge, and click on Review and Launch, followed by Launch in the next screen.
- In order to establish a secure SSH connection to your launched instance, AWS recommends connecting through key-value pair. Refer to the user guide on how to create and use key pairs.
- Launch and view the instance.
- On the AWS console, you can find your instance running. Proceed to connect to it.
- The Connect button opens a dialog box with instructions to connect via SSH. Open your Linux bash prompt and SSH into the running AWS instance. Make sure to use the same key-value used at the time of launch in Step 5. Alternatively, you can connect to the AWS instance through Putty. Enter the hostname or IP address.
Go to Connection → SSH → Auth on the side pane to enter the directory location of the private file. Click on Open.
- Once you have successfully connected to the AWS instance via SSH or Putty, verify that VNNI is enabled on your instance.
lscpu | grep vnni
Run ResNet50 INT8 Inference
- Install prerequisites: Python3 and Docker
sudo apt install python docker.io
- Clone the intelai/models repository.
$ git clone https://github.com/IntelAI/models.git
- Download the pre-trained quantized model.
$ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/resnet50_int8_pretrained_model.pb
- Navigate to the benchmarks directory in your local clone of the intelai/models repo.
- The launch_benchmark.py script in this directory is used for running models in an optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and Docker image to run. See the ResNet50 INT8 model’s README for more options.
Note: If you experience user level permission issues on docker.sock, provide read and write privileges to your id and rerun the below python command again
sudo setfacl -m user:$USER:rw /var/run/docker.sock
For maximum throughput inference ResNet50 INT8 in TensorFlow:
python launch_benchmark.py \
--model-name resnet50 \
--precision int8 \
--mode inference \
--framework tensorflow \
--batch-size 128 \
--docker-image gcr.io/deeplearning-platform-release/tf-cpu.1-14 \
Example log tail:
Iteration 48: 0.121367 sec
Iteration 49: 0.120932 sec
Iteration 50: 0.120965 sec
Average time: 0.121058 sec
Batch size = 128
Throughput: 1057.344 images/sec
Ran inference with batch size 128
Log location outside container:
Don’t forget to disconnect your AWS session after your experiments to avoid incurring additional charges. Go to the AWS console, right click on the running instance and click Stop.
We got this example up and running in the time it takes to brew a pot of coffee! Go ahead, give it a try, and see how easy it is to get started with TensorFlow containers optimized for deep learning inference on Intel Xeon Scalable processors. For more information, check out the links below and follow us at @IntelAIResearch or online at intel.ai.
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice Revision #20110804
Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others. © Intel Corporation