Though they are effective at a variety of computer vision tasks, deep neural networks (DNNs) have been shown to be vulnerable to attacks based on adversarial examples, or images perceptually similar to the real images but intentionally constructed to fool learning models. This has limited the application of DNNs in security-critical systems.
My colleagues and I propose a training recipe named “deep defense” to address this vulnerability. Deep defense integrates an adversarial perturbation-based regularizer into the classification objective, such that the obtained models learn to resist potential attacks, directly and precisely. Experimental results with MNIST*, CIFAR-10*, and ImageNet* demonstrate that our deep defense method significantly improves the resistance of different DNNs to advanced adversarial attacks with no observed accuracy degradation. Results indicate that our method outperforms training with adversarial/Parseval regularizations by large margins on these datasets and different DNN architectures. With this and future work, we hope to improve the resistance of DNNs to adversarial attacks and therefore increase the suitability of DNNs to security-critical applications.
Earlier studies have synthesized adversarial examples by applying worst-case perturbations to real images   . It has been shown that perturbations for fooling a DNN model can be 1000x smaller in magnitude when compared with real images, making these perturbations imperceptible to the naked eye. Studies have found that even leading DNN solutions can be fooled to misclassify these adversarial examples with high confidence.
This vulnerability to adversarial examples can lead to significant issues in real-world applications, such as face ID systems. Unlike certain instability against random noise, which is theoretically and practically guaranteed to be less critical , the vulnerability of DNNs to adversarial perturbations is more severe.
Several earlier studies have investigated this vulnerability   . Goodfellow et al. argue that the main reason why DNNs are vulnerable is their linear nature instead of nonlinearity and overfitting. Based on the explanation, they design an efficient l∞ induced perturbation and further propose to combine it with adversarial training for regularization. Recently, Cisse et al.  investigate the Lipschitz constant of DNN-based classifiers and propose Parseval training. However, similar to some previous and contemporary methods, approximations to the theoretically optimal constraint are required in practice, making the method less effective to resist very strong attacks.
We introduce “Deep Defense,” a regularization method to train DNNs with improved robustness against adversarial examples. Unlike previous methods which make approximations and optimize possibly untight bounds, we precisely integrate a perturbation-based regularizer into the classification objective. The DNN models can therefore directly learn from, and develop further resistance to, adversarial attacks in a principled way.
Specifically, we penalize the norm of adversarial perturbations by encouraging relatively large values for the correctly classified samples and possibly small values for the misclassified ones. As a regularizer, it is jointly optimized with the original learning objective and the whole problem is efficiently solved by being considered as training a recursive-flavored network.
As noted, we find this approach to significantly increase the robustness of DNNs to advanced adversarial attacks with no observed accuracy degradation.
We look forward to extending this research with future works pertaining to resisting black-box attacks and attacks in the physical world and continuing to work to extend the utility of DNNs to more applications.
For more information, please review our study Deep Defense: Training DNNs with Improved Adversarial Robustness, which was presented at the 2018 NeurIPS conference. For more AI research from Intel, follow @IntelAIDev and @IntelAI on Twitter and tune in to https://ai.intel.com.