08 Oct 2020
The website SecureMachineLearning.org or TrustworthyMachineLearning.org introduces updates of a suite of tools we have developed for making machine learning secure and robust.
Classifiers based on machine learning algorithms have shown promising results for many security tasks including malware classification and network intrusion detection, but classic machine learning algorithms are not designed to operate in the presence of adversaries. Intelligent and adaptive adversaries may actively manipulate the information they present in attempts to evade a trained classifier, leading to a competition between the designers of learning systems and attackers who wish to evade them. This project is developing automated techniques for predicting how well classifiers will resist the evasions of adversaries, along with general methods to automatically harden machine-learning classifiers against adversarial evasion attacks.
Five important tasks
At the junction between machine learning and computer security, this project involves toolboxes for five main task as shown in the following table. Our system aims to allow a classifier designer to understand how the classification performance of a model degrades under evasion attacks, enabling better-informed and more secure design choices. The framework is general and scalable, and takes advantage of the latest advances in machine learning and computer security.
Have questions or suggestions? Feel free to ask me on Twitter or email me.
Thanks for reading!
21 Dec 2018
On December 21 @ 12noon, I gave a distinguished webinar talk in the Fall 2018 webinar series of the Institute for Information Infrastructure Protection (I3P) (@ the George Washington University and SRI International).
Webinar Recording @ URL
02 Mar 2018
Abstract
Although deep neural networks (DNNs) have achieved great success in many computer vision tasks, recent studies have shown they are vulnerable to adversarial examples. Such examples, typically generated by adding small but purposeful distortions, can frequently fool DNN models. Previous studies to defend against adversarial examples mostly focused on refining the DNN models. They have either shown limited success or suffer from the expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model’s prediction on the original input with that on the squeezed input, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two instances of feature squeezing: reducing the color bit depth of each pixel and smoothing using a spatial filter. These strategies are straightforward, inexpensive, and complementary to defensive methods that operate on the underlying model, such as adversarial training.

Citations
@inproceedings{Xu0Q18,
author = {Weilin Xu and
David Evans and
Yanjun Qi},
title = {Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks},
booktitle = {25th Annual Network and Distributed System Security Symposium, {NDSS}
2018, San Diego, California, USA, February 18-21, 2018},
year = {2018},
crossref = {DBLP:conf/ndss/2018},
url = {http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018\_03A-4\_Xu\_paper.pdf},
timestamp = {Thu, 09 Aug 2018 10:57:16 +0200},
biburl = {https://dblp.org/rec/bib/conf/ndss/Xu0Q18},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Having troubl with our tools? Please contact Weilin and we’ll help you sort it out.
12 Jan 2018
Title: Black-box Generation of Adversarial Text Sequences to Fool Deep Learning Classifiers

Published @ 2018 IEEE Security and Privacy Workshops (SPW),
co-located with the 39th IEEE Symposium on Security and Privacy.
TalkSlide: URL
Abstract
Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to a black-box attack, which is a more realistic scenario. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We develop novel scoring strategies to find the most important words to modify such that the deep classifier makes a wrong prediction. Simple character-level transformations are applied to the highest-ranked words in order to minimize the edit distance of the perturbation. We evaluated DeepWordBug on two real-world text datasets: Enron spam emails and IMDB movie reviews. Our experimental results indicate that DeepWordBug can reduce the classification accuracy from 99% to around 40% on Enron data and from 87% to about 26% on IMDB. Also, our experimental results strongly demonstrate that the generated adversarial sequences from a deep-learning model can similarly evade other deep models.
Citations
@INPROCEEDINGS{JiDeepWordBug18,
author={J. Gao and J. Lanchantin and M. L. Soffa and Y. Qi},
booktitle={2018 IEEE Security and Privacy Workshops (SPW)},
title={Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers},
year={2018},
pages={50-56},
keywords={learning (artificial intelligence);pattern classification;program debugging;text analysis;deep learning classifiers;character-level transformations;IMDB movie reviews;Enron spam emails;real-world text datasets;scoring strategies;text input;text perturbations;DeepWordBug;black-box attack;adversarial text sequences;black-box generation;Perturbation methods;Machine learning;Task analysis;Recurrent neural networks;Prediction algorithms;Sentiment analysis;adversarial samples;black box attack;text classification;misclassification;word embedding;deep learning},
doi={10.1109/SPW.2018.00016},
month={May},}
Having trouble with our tools? Please contact Ji Gao and we’ll help you sort it out.
12 Dec 2017

About
We have designed and implemented EvadeML-Zoo, a benchmarking and visualization tool for research on adversarial machine learning. The goal of EvadeML-Zoo is to ease the experimental setup and help researchers evaluate and verify their results.
EvadeML-Zoo has a modular architecture and is designed to make it easy to add new datasets, pre-trained target models, attack or defense algorithms. The code is open source under the MIT license.
We have integrated three popular datasets: MNIST, CIFAR-10 and ImageNet- ILSVRC with a simple and unified interface. We offer several representative pre-trained models with state-of-the-art accuracy for each dataset including two pre-trained models for ImageNet-ILSVRC: the heavy Inception-v3 and and the lightweight MobileNet. We use Keras to access the pre-trained models because it provides a simplified interface and it is compatible with TensorFlow, which is a flexible tool for implementing attack and defense techniques.
We have integrated several existing attack algorithms as baseline for the upcoming new methods, including FGSM, BIM, JSMA, Deepfool, Universal Adversarial Perturbations, and Carlini and Wagner’s algorithms.
We have integrated our “feature squeezing” based detection framework in this toolbox. Formulating detecting adversarial examples as a binary classification task, we first construct a balanced dataset with equal number of legitimate and adversarial examples, and then split it into training and test subsets. A detection method has full access to the training set but no access to the labels of the test set. We measure the TPR and FPR on the test set as the benchmark detection results. Our Feature Squeezing functions as the detection baseline. Users can easily add more detection methods using our framework.
Besides, the tool comes with an interactive web-based visualization module adapted from our previous ADVERSARIAL-PLAYGROUND package. This module enables better understanding of the impact of attack algorithms on the resulting adversarial sample; users may specify attack algorithm parameters for a variety of attack types and generate new samples on-demand. The interface displays the resulting adversarial example as compared to the original, classification likelihoods, and the influence of a target model throughout layers of the network.
Citations
@inproceedings{Xu0Q18,
author = {Weilin Xu and
David Evans and
Yanjun Qi},
title = {Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks},
booktitle = {25th Annual Network and Distributed System Security Symposium, {NDSS}
2018, San Diego, California, USA, February 18-21, 2018},
year = {2018},
crossref = {DBLP:conf/ndss/2018},
url = {http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018\_03A-4\_Xu\_paper.pdf},
timestamp = {Thu, 09 Aug 2018 10:57:16 +0200},
biburl = {https://dblp.org/rec/bib/conf/ndss/Xu0Q18},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Having troubl with our tools? Please contact Weilin and we’ll help you sort it out.
03 Aug 2017
Revised Version2 Paper Arxiv
Revised Title: Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning
Publish @ The IEEE Symposium on Visualization for Cyber Security (VizSec) 2017 -ULR

Abstract
Recent studies have shown that attackers can force deep learning models to
misclassify so-called “adversarial examples”: maliciously generated images
formed by making imperceptible modifications to pixel values. With growing
interest in deep learning for security applications, it is important for
security experts and users of machine learning to recognize how learning
systems may be attacked. Due to the complex nature of deep learning, it is
challenging to understand how deep models can be fooled by adversarial
examples. Thus, we present a web-based visualization tool,
Adversarial-Playground, to demonstrate the efficacy of common adversarial
methods against a convolutional neural network (CNN) system.
Adversarial-Playground is educational, modular and interactive. (1) It enables
non-experts to compare examples visually and to understand why an adversarial
example can fool a CNN-based image classifier. (2) It can help security experts
explore more vulnerability of deep learning as a software module. (3) Building
an interactive visualization is challenging in this domain due to the large
feature space of image classification (generating adversarial examples is slow
in general and visualizing images are costly). Through multiple novel design
choices, our tool can provide fast and accurate responses to user requests.
Empirically, we find that our client-server division strategy reduced the
response time by an average of 1.5 seconds per sample. Our other innovation, a
faster variant of JSMA evasion algorithm, empirically performed twice as fast
as JSMA and yet maintains a comparable evasion rate.
Project source code and data from our experiments available at:
GitHub
Citations
@inproceedings{norton2017adversarial,
title={Adversarial-Playground: A visualization suite showing how adversarial examples fool deep learning},
author={Norton, Andrew P and Qi, Yanjun},
booktitle={Visualization for Cyber Security (VizSec), 2017 IEEE Symposium on},
pages={1--4},
year={2017},
organization={IEEE}
}
Having trouble with our tools? Please contact Andrew Norton and we’ll help you sort it out.
02 Aug 2017
Abstract
Feature squeezing is a recently-introduced framework for mitigating and detecting adversarial examples. In previous work, we showed that it is effective against several earlier methods for generating adversarial examples. In this short note, we report on recent results showing that simple feature squeezing techniques also make deep learning models significantly more robust against the Carlini/Wagner attacks, which are the best known adversarial methods discovered to date.

Citations
@article{xu2017feature,
title={Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples},
author={Xu, Weilin and Evans, David and Qi, Yanjun},
journal={arXiv preprint arXiv:1705.10686},
year={2017}
}
Having troubl with our tools? Please contact Weilin and we’ll help you sort it out.
04 Jun 2017
Abstract
With growing interest in adversarial machine learning, it is important for machine learning practitioners and users to understand how their models may be attacked. We propose a web-based visualization tool, \textit{Adversarial-Playground}, to demonstrate the efficacy of common adversarial methods against a deep neural network (DNN) model, built on top of the TensorFlow library. Adversarial-Playground provides users an efficient and effective experience in exploring techniques generating adversarial examples, which are inputs crafted by an adversary to fool a machine learning system. To enable Adversarial-Playground to generate quick and accurate responses for users, we use two primary tactics: (1) We propose a faster variant of the state-of-the-art Jacobian saliency map approach that maintains a comparable evasion rate. (2) Our visualization does not transmit the generated adversarial images to the client, but rather only the matrix describing the sample and the vector representing classification likelihoods.


Citations
@inproceedings{norton2017adversarial,
title={Adversarial-Playground: A visualization suite showing how adversarial examples fool deep learning},
author={Norton, Andrew P and Qi, Yanjun},
booktitle={Visualization for Cyber Security (VizSec), 2017 IEEE Symposium on},
pages={1--4},
year={2017},
organization={IEEE}
}
Having trouble with our tools? Please contact Andrew Norton and we’ll help you sort it out.
03 Jun 2017
Abstract
Recent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in security-sensitive settings. It was observed that an adversary could easily generate adversarial samples by making a small perturbation on irrelevant feature dimensions that are unnecessary for the current classification task. To overcome this problem, we introduce a defensive mechanism called DeepCloak. By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs. Comparing with other defensive approaches, DeepCloak is easy to implement and computationally efficient. Experimental results show that DeepCloak can increase the performance of state-of-the-art DNN models against adversarial samples.

Citations
@article{gao2017deepmask,
title={DeepCloak: Masking DNN Models for robustness against adversarial samples},
author={Gao, Ji and Wang, Beilun and Qi, Yanjun},
journal={arXiv preprint arXiv:1702.06763},
year={2017}
}
Having trouble with our tools? Please contact Ji Gao and we’ll help you sort it out.
01 Jun 2017
Paper: Automatically Evading Classifiers,
A Case Study on PDF Malware Classifiers NDSS16
More information is provided by EvadeML.org
By using evolutionary techniques to simulate an adversary’s efforts to evade that classifier
Abstract
Machine learning is widely used to develop classifiers for security tasks. However, the robustness of these methods
against motivated adversaries is uncertain. In this work, we
propose a generic method to evaluate the robustness of classifiers
under attack. The key idea is to stochastically manipulate a
malicious sample to find a variant that preserves the malicious
behavior but is classified as benign by the classifier. We present
a general approach to search for evasive variants and report on
results from experiments using our techniques against two PDF
malware classifiers, PDFrate and Hidost. Our method is able to
automatically find evasive variants for both classifiers for all of
the 500 malicious seeds in our study. Our results suggest a general
method for evaluating classifiers used in security applications, and
raise serious doubts about the effectiveness of classifiers based
on superficial features in the presence of adversaries.

Citations
@inproceedings{xu2016automatically,
title={Automatically evading classifiers},
author={Xu, Weilin and Qi, Yanjun and Evans, David},
booktitle={Proceedings of the 2016 Network and Distributed Systems Symposium},
year={2016}
}
Having troubl with our tools? Please contact Weilin and we’ll help you sort it out.
11 May 2017
Abstract
Most machine learning classifiers, including deep neural networks, are vulnerable to adversarial examples. Such inputs are typically generated by adding small but purposeful modifications that lead to incorrect outputs while imperceptible to human eyes. The goal of this paper is not to introduce a single method, but to make theoretical steps towards fully understanding adversarial examples. By using concepts from topology, our theoretical analysis brings forth the key reasons why an adversarial example can fool a classifier (f1) and adds its oracle (f2, like human eyes) in such analysis. By investigating the topological relationship between two (pseudo)metric spaces corresponding to predictor f1 and oracle f2, we develop necessary and sufficient conditions that can determine if f1 is always robust (strong-robust) against adversarial examples according to f2. Interestingly our theorems indicate that just one unnecessary feature can make f1 not strong-robust, and the right feature representation learning is the key to getting a classifier that is both accurate and strong-robust.
Recent studies are mostly empirical and provide little understanding
of why an adversary can fool machine learning models with adversarial examples. Several
important questions have not been answered yet:
- What makes a classifier always robust to adversarial examples?
- Which parts of a classifier influence its robustness against adversarial examples more, compared
with the rest?
- What is the relationship between a classifier’s generalization accuracy and its robustness against
adversarial examples?
- Why (many) DNN classifiers are not robust against adversarial examples ? How to improve?
This paper uses the following framework
-
to understand adversarial examples (by considering the role of oracle):

-
The following figure provides a simple case illustration explaining unnecessary features make a classifier vulnerable to adversarial examples:

-
The following figure tries to explain why DNN models are vulnerable to adversarial examples:

Citations
@article{wang2016theoretical,
title={A theoretical framework for robustness of (deep) classifiers under adversarial noise},
author={Wang, Beilun and Gao, Ji and Qi, Yanjun},
journal={arXiv preprint},
year={2016}
}
Having trouble with our tools? Please contact Beilun and we’ll help you sort it out.