Deep Defense: Training DNNs with Improved Adversarial Robustness

Though they are effective at a variety of computer vision tasks, deep neural networks (DNNs) have been shown to be vulnerable to attacks based on adversarial examples[1], or images perceptually similar to the real images but intentionally constructed to fool learning models. This has limited the application of DNNs in security-critical systems.

My colleagues[2] and I propose a training recipe named “deep defense” to address this vulnerability. Deep defense integrates an adversarial perturbation-based regularizer into the classification objective, such that the obtained models learn to resist potential attacks, directly and precisely. Experimental results with MNIST*, CIFAR-10*, and ImageNet* demonstrate that our deep defense method significantly improves the resistance of different DNNs to advanced adversarial attacks with no observed accuracy degradation. Results indicate that our method outperforms training with adversarial/Parseval regularizations by large margins on these datasets and different DNN architectures. With this and future work, we hope to improve the resistance of DNNs to adversarial attacks and therefore increase the suitability of DNNs to security-critical applications.

Earlier Research on Adversarial Examples

Earlier studies have synthesized adversarial examples by applying worst-case perturbations to real images[1] [3] [4] [5]. It has been shown that perturbations for fooling a DNN model can be 1000x smaller in magnitude when compared with real images, making these perturbations imperceptible to the naked eye. Studies have found that even leading DNN solutions can be fooled to misclassify these adversarial examples with high confidence[6].

This vulnerability to adversarial examples can lead to significant issues in real-world applications, such as face ID systems. Unlike certain instability against random noise, which is theoretically and practically guaranteed to be less critical[1] [7], the vulnerability of DNNs to adversarial perturbations is more severe.

Several earlier studies have investigated this vulnerability[1] [3] [8] [9]. Goodfellow et al.[3] argue that the main reason why DNNs are vulnerable is their linear nature instead of nonlinearity and overfitting. Based on the explanation, they design an efficient l induced perturbation and further propose to combine it with adversarial training[3] for regularization. Recently, Cisse et al. [8] investigate the Lipschitz constant of DNN-based classifiers and propose Parseval training. However, similar to some previous and contemporary methods, approximations to the theoretically optimal constraint are required in practice, making the method less effective to resist very strong attacks.

Deep Defense

We introduce “Deep Defense,” a regularization method to train DNNs with improved robustness against adversarial examples. Unlike previous methods which make approximations and optimize possibly untight bounds, we precisely integrate a perturbation-based regularizer into the classification objective. The DNN models can therefore directly learn from, and develop further resistance to, adversarial attacks in a principled way.

Specifically, we penalize the norm of adversarial perturbations by encouraging relatively large values for the correctly classified samples and possibly small values for the misclassified ones. As a regularizer, it is jointly optimized with the original learning objective and the whole problem is efficiently solved by being considered as training a recursive-flavored network.

As noted, we find this approach to significantly increase the robustness of DNNs to advanced adversarial attacks with no observed accuracy degradation.

Improving the Utility of Deep Learning

We look forward to extending this research with future works pertaining to resisting black-box attacks and attacks in the physical world and continuing to work to extend the utility of DNNs to more applications.

For more information, please review our study Deep Defense: Training DNNs with Improved Adversarial Robustness, which was presented at the 2018 NeurIPS conference. For more AI research from Intel, follow @IntelAIDev and @IntelAI on Twitter and tune in to

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© Intel Corporation  
[1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing Properties of Neural Networks. In ICLR, 2014.
[2] Ziang Yan & Changshui Zhang, both from the Institute for Artificial Intelligence, Tsinghua University (THUAI), the State Key Lab of Intelligent Technologies and Systems, Beijing National Research Center for Information Science and Technology (BNRist), the Department of Automation, Tsinghua University, Beijing, China.
[3] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In ICLR, 2015.
[4] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In CVPR, 2016.
[5] Nicholas Carlini and David Wagner. Adversarial Examples are not Easily Detected: Bypassing Ten Detection Methods. In ACM Workshop on Artificial Intelligence and Security, 2017.
[6] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial Machine Learning at Scale. In ICLR, 2017.
[7] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of Classifiers: From Adversarial to Random Noise. In NIPS, 2016.
[8] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval Networks: Improving Robustness to Adversarial Examples. In ICML, 2017.
[9] Matthias Hein and Maksym Andriushchenko. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. In NIPS, 2017.