This paper reviews a biomedical image segmentation project conducted in partnership with the AI team at General Electric’s Global Research Center. We begin by detailing the network topology (U-Net) and the Brain Tumor Segmentation (BraTS) dataset1 used to benchmark training performance. All training is performed on Intel Xeon® Platinum 8168 servers, and we outline both single and multi-node implementations. Leveraging Intel’s® Math Kernel Library for Deep Neural Networks (MKL-DNN), we demonstrate a greater than 7X speedup in time-to-train on a single node and another 2X speedup in a multi-node environment. We conclude with a summary of best-known-methods for optimizing Convolutional Neural Network (CNN) topologies on Intel architecture.
Code and instructions for running this implementation of U-Net can be found at https://github.com/NervanaSystems/topologiesTraining-Deep-CNNs-w-Horovod-on-Intel-HPC-Architecture