Blackbox Generation of Adversarial Text Sequences

Title: Black-box Generation of Adversarial Text Sequences to Fool Deep Learning Classifiers


Paper Arxiv

Published @ 1ST DEEP LEARNING AND SECURITY WORKSHOP, co-located with the 39th IEEE Symposium on Security and Privacy.

GitHub: [Coming]

TalkSlide: URL


Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to a black-box attack, which is a more realistic scenario. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We develop novel scoring strategies to find the most important words to modify such that the deep classifier makes a wrong prediction. Simple character-level transformations are applied to the highest-ranked words in order to minimize the edit distance of the perturbation. We evaluated DeepWordBug on two real-world text datasets: Enron spam emails and IMDB movie reviews. Our experimental results indicate that DeepWordBug can reduce the classification accuracy from 99% to around 40% on Enron data and from 87% to about 26% on IMDB. Also, our experimental results strongly demonstrate that the generated adversarial sequences from a deep-learning model can similarly evade other deep models.


  title={Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers},
  author={Gao, Ji and Lanchantin, Jack and Soffa, Mary Lou and Qi, Yanjun},
  journal={arXiv preprint arXiv:1801.04354},

Support or Contact

Having trouble with our tools? Please contact Ji Gao and we’ll help you sort it out.

View Posts Feed