Emphysema, a progressive lung disease that impacts breathing ability, affects more than 3 million people in the United States, and more than 65 million people worldwide. Early detection is key in stopping the progression of emphysema, which in severe cases is life-threatening. Pneumonia, a lung infection that also impacts breathing, causes another 1.4 million deaths annually around the world. In most cases, it, too, is treatable with early detection.
Research Using CheXNet at Stanford: CheXNet is a deep learning Convolutional Neural Network (CNN) model developed at Stanford University to identify thoracic pathologies from the NIH ChestXray14 dataset. CheXNet is a 121-layer CNN that uses chest X-Ray images to predict the output probabilities of a pathology. It correctly detects pneumonia by localizing the areas in the image that are most indicative of the pathology. Stanford researchers have been able to train the ChestX-Ray14 dataset using a pre-trained model of CheXNet-121 with the ImageNet2012-1K dataset. The NIH dataset consists of over one hundred thousand frontal chest X-ray images from over 30,000 unique patients that have been annotated with up to 14 thoracic diseases including pneumonia and emphysema. CheXNet-121 outperforms the best-published results on all 14 pathologies in the ChestX-Ray14 dataset.
Extending Research on HPC Infrastructure: In this joint work, DellEMC, SURFsara, and Intel extended the research using VGG-16 and ResNet-50 CNN models scaled out across a large number of Intel® Xeon® Scalable processors running on Dell EMC’s Zenith supercomputer and accurately pre-trained on the ImageNet2012-1K dataset. Our team was able to significantly reduce the training time and outperform the CheXNet-121 published results in four pathological categories using VGG-16 and up to 10 categories (including pneumonia and emphysema).
We first pre-trained the network on the ImageNet2012 dataset on 200 nodes of Dell EMC’s Zenith HPC Cluster using Intel® Optimization for TensorFlow* and Horovod distributed training framework. The chart below shows the performance of ResNet-50 pre-trained to > 75% Top-1 accuracy resulting in a time-to-train speedup of 3X on 200 nodes relative to 64 nodes on the Zenith cluster. We followed the methodology to fine-tune ResNet-50 on ImageNet2012-1K, similar to previous work done by SURFsara and Intel.
The charts below show throughput performance and accuracy of a pre-trained model with the ImageNet2012 dataset, using default implementation in Keras* with Intel Optimizations for TensorFlow with Intel® Math Kernel Library for Deep Neural Networks (MKL-DNN) and exploiting NUMA domains with multiple workers per node.
We parallelized, optimized and scaled both VGG-16 and CheXNet-121 models on up to 64 Intel Xeon processor nodes. Figure 4 shows that using the pre-trained VGG-16 model, we were able to achieve 6.3X faster throughput performance on 64 nodes than CheXNet-121 on 32 nodes on the Dell EMC Zenith cluster.
Figure 5 shows training accuracy measured with VGG-16 compared to the published CheXNet model. For the two important pathologies (pneumonia and emphysema), VGG-16 is better or at par with CheXNet on the AUROC metric.
Next, we fine-tuned the pre-trained ResNet-50 model and measured its performance against the ChestXRay14 dataset. We achieved a 4.7X speedup in throughput performance with a TensorFlow-only implementation compared to Keras* +TensorFlow implementation on 128 Intel Xeon nodes on the Zenith cluster. This result demonstrated that Keras has significant performance overhead.
Figure 6 shows that using pre-trained ResNet-50, the throughput performance using TensorFlow on 128 nodes is 104X faster than single node performance on the Dell EMC Zenith cluster. Figure 6 also shows scale out training performance using ResNet-50 relative to single node performance up to 256 nodes on Zenith cluster.
Figure 7 below shows the accuracy of ResNet-50 relative to CheXNet-121. By using pre-trained ResNet-50 model against the ImageNet2012 dataset, we were able to achieve up to 4% better training accuracy (positive AUROC) than the published CheXNet-121 in 10 categories out of 14 pathologies.
Figure 7 shows the accuracy of ResNet-50 relative to CheXNet-121. By using pre-trained ResNet-50 model, we were able to achieve up to 4% better training accuracy (positive AUROC) than the published CheXNet-121 in 10 categories out of 14 pathologies.
In healthcare, prevention is key to saving lives, improving outcomes, and reducing costs. Models which can help identify disease will be critical to providing quality care to everyone in a timely fashion. As we’ve shown, scale-out training of neural network models can reduce the time to solution from weeks to minutes, using the same compute infrastructure that is already being used for everyday operations in hospitals and medical research labs around the world.
If you’d like to learn more, check out a recording of our presentation on this topic at the Intel AI® DevCon in May 2018. Two of our authors, Valeriu and Damian, also shared their insights earlier this month at the Artificial Intelligence conference in London jointly presented by O’Reilly Media and Intel Corporation.
OMP_NUM_THREADS=20 HOROVOD_FUSION_THRESHOLD=134217728 export I_MPI_FABRICS=tmi, export I_MPI_TMI_PROVIDER=psm2 mpirun -np 512 -ppn 2 python resnet_main.py --train_batch_size 8192 --train_steps 14075 --num_intra_threads 20 --num_inter_threads 2 --mkl=True --data_dir=/scratch/04611/valeriuc/tf-1.6/tpu_rec/train --model_dir model_batch_8k_90ep --use_tpu=False --kmp_blocktime 1. Baseline configuration: 64 nodes.