Training Deep Convolutional Neural Networks with Horovod* on Intel® High Performance Computing Architecture

This paper reviews a biomedical image segmentation project conducted in partnership with the AI team at General Electric’s Global Research Center. We begin by detailing the network topology (U-Net) and the Brain Tumor Segmentation (BraTS) dataset1 used to benchmark training performance. All training is performed on Intel Xeon® Platinum 8168 servers, and we outline both single and multi-node implementations. Leveraging Intel’s® Math Kernel Library for Deep Neural Networks (MKL-DNN), we demonstrate a greater than 7X speedup in time-to-train on a single node and another 2X speedup in a multi-node environment. We conclude with a summary of best-known-methods for optimizing Convolutional Neural Network (CNN) topologies on Intel architecture.

Code and instructions for running this implementation of U-Net can be found at

Download File