The website securemachinelearning.org introduces updates of a suite of tools we have designed for making machine learning secure and robust. Feel free to submit pull requests when you find my typos. At the junction between machine learning and computer security, this project involves toolboxes for five main tasks (organized as entries in the navigation menu).

Blog Posts


Adversarial-Playground Paper To Appear @ VizSec

Revised Version2 Paper Arxiv

Revised Title: Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning

To present @ The IEEE Symposium on Visualization for Cyber Security (VizSec) 2017

Abstract

Recent studies have shown that attackers can force deep learning models to misclassify so-called “adversarial examples”: maliciously generated images formed by making imperceptible modifications to pixel values. With growing interest in deep learning for security applications, it is important for security experts and users of machine learning to recognize how learning systems may be attacked. Due to the complex nature of deep learning, it is challenging to understand how deep models can be fooled by adversarial examples. Thus, we present a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a convolutional neural network (CNN) system. Adversarial-Playground is educational, modular and interactive. (1) It enables non-experts to compare examples visually and to understand why an adversarial example can fool a CNN-based image classifier. (2) It can help security experts explore more vulnerability of deep learning as a software module. (3) Building an interactive visualization is challenging in this domain due to the large feature space of image classification (generating adversarial examples is slow in general and visualizing images are costly). Through multiple novel design choices, our tool can provide fast and accurate responses to user requests. Empirically, we find that our client-server division strategy reduced the response time by an average of 1.5 seconds per sample. Our other innovation, a faster variant of JSMA evasion algorithm, empirically performed twice as fast as JSMA and yet maintains a comparable evasion rate. Project source code and data from our experiments available at: GitHub

Citations

@article{norton2017advplayground,
  title={Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning},
  author={Norton, Andrew and Qi, Yanjun},
  url = {http://arxiv.org/abs/1708.00807}
  year={2017},
}

Support or Contact

Having trouble with our tools? Please contact Andrew Norton and we’ll help you sort it out.

Feature Squeezing Mitigates and Detects Carlini-Wagner Adversarial Examples

Paper Arxiv

Abstract

Feature squeezing is a recently-introduced framework for mitigating and detecting adversarial examples. In previous work, we showed that it is effective against several earlier methods for generating adversarial examples. In this short note, we report on recent results showing that simple feature squeezing techniques also make deep learning models significantly more robust against the Carlini/Wagner attacks, which are the best known adversarial methods discovered to date.

Citations

@article{xu2017feature,
  title={Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples},
  author={Xu, Weilin and Evans, David and Qi, Yanjun},
  journal={arXiv preprint arXiv:1705.10686},
  year={2017}
}

Support or Contact

Having troubl with our tools? Please contact Weilin and we’ll help you sort it out.

securemachinelearning.org is up and running!

The website securemachinelearning.org introduces updates of a suite of tools we have developed for making machine learning secure and robust.

Scope of problems our tools aim to tackle

Classifiers based on machine learning algorithms have shown promising results for many security tasks including malware classification and network intrusion detection, but classic machine learning algorithms are not designed to operate in the presence of adversaries. Intelligent and adaptive adversaries may actively manipulate the information they present in attempts to evade a trained classifier, leading to a competition between the designers of learning systems and attackers who wish to evade them. This project is developing automated techniques for predicting how well classifiers will resist the evasions of adversaries, along with general methods to automatically harden machine-learning classifiers against adversarial evasion attacks.

Five important tasks

At the junction between machine learning and computer security, this project involves toolboxes for five main task as shown in the following table. Our system aims to allow a classifier designer to understand how the classification performance of a model degrades under evasion attacks, enabling better-informed and more secure design choices. The framework is general and scalable, and takes advantage of the latest advances in machine learning and computer security.

No. Tool Name Short Description
1 Evade Machine Learning Tools we designed to Automatically Evade Classifiers
2 Detect Adversarial Attacks Tools we designed for Detecting Adversarial Examples in Deep Neural Networks
3 Defense against Adversarial Attacks Tools we designed for defending against Adversarial Examples in Deep Neural Networks
4 Visualize Adversarial Attacks Tools we designed for Visualizing Adversarial Examples
5 Theorems of Adversarial Machine Learning Theorems we proposed for understanding Adversarial Examples in Machine Learning

Contact

Have questions or suggestions? Feel free to ask me on Twitter or email me.

Thanks for reading!

Adversarial-Playground- A Visualization Suite for Adversarial Sample Generation

GitHub: AdversarialDNN-Playground

Paper Arxiv

Poster

Abstract

With growing interest in adversarial machine learning, it is important for machine learning practitioners and users to understand how their models may be attacked. We propose a web-based visualization tool, \textit{Adversarial-Playground}, to demonstrate the efficacy of common adversarial methods against a deep neural network (DNN) model, built on top of the TensorFlow library. Adversarial-Playground provides users an efficient and effective experience in exploring techniques generating adversarial examples, which are inputs crafted by an adversary to fool a machine learning system. To enable Adversarial-Playground to generate quick and accurate responses for users, we use two primary tactics: (1) We propose a faster variant of the state-of-the-art Jacobian saliency map approach that maintains a comparable evasion rate. (2) Our visualization does not transmit the generated adversarial images to the client, but rather only the matrix describing the sample and the vector representing classification likelihoods.

Playground

pg pg

Citations

@article{norton2017advplayground,
  title={Adversarial Playground: A Visualization Suite for Adversarial Sample Generation},
  author={Norton, Andrew and Qi, Yanjun},
  url = {http://arxiv.org/abs/1706.01763}
  year={2017},
}

Support or Contact

Having trouble with our tools? Please contact Andrew Norton and we’ll help you sort it out.

DeepCloak- Masking Deep Neural Network Models for Robustness against Adversarial Samples

GitHub: DeepCloak

Paper ICLR17 Workshop

Poster

Abstract

Recent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in security-sensitive settings. It was observed that an adversary could easily generate adversarial samples by making a small perturbation on irrelevant feature dimensions that are unnecessary for the current classification task. To overcome this problem, we introduce a defensive mechanism called DeepCloak. By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs. Comparing with other defensive approaches, DeepCloak is easy to implement and computationally efficient. Experimental results show that DeepCloak can increase the performance of state-of-the-art DNN models against adversarial samples.

deepCloak

Citations

@article{gao2017deepmask,
  title={DeepCloak: Masking DNN Models for robustness against adversarial samples},
  author={Gao, Ji and Wang, Beilun and Qi, Yanjun},
  journal={arXiv preprint arXiv:1702.06763},
  year={2017}
}

Support or Contact

Having trouble with our tools? Please contact Ji Gao and we’ll help you sort it out.

Feature Squeezing- Detecting Adversarial Examples in Deep Neural Networks

GitHub: FeatureSqueezing

Paper Arxiv

Abstract

Although deep neural networks (DNNs) have achieved great success in many computer vision tasks, recent studies have shown they are vulnerable to adversarial examples. Such examples, typically generated by adding small but purposeful distortions, can frequently fool DNN models. Previous studies to defend against adversarial examples mostly focused on refining the DNN models. They have either shown limited success or suffer from the expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model’s prediction on the original input with that on the squeezed input, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two instances of feature squeezing: reducing the color bit depth of each pixel and smoothing using a spatial filter. These strategies are straightforward, inexpensive, and complementary to defensive methods that operate on the underlying model, such as adversarial training.

evadePDF

Citations

@article{xu2017feature,
  title={Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks},
  author={Xu, Weilin and Evans, David and Qi, Yanjun},
  journal={arXiv preprint arXiv:1704.01155},
  year={2017}
}

Support or Contact

Having troubl with our tools? Please contact Weilin and we’ll help you sort it out.

A Tool for Automatically Evading Classifiers for PDF Malware detection

More information is provided by EvadeML.org

By using evolutionary techniques to simulate an adversary’s efforts to evade that classifier

GitHub: EvadePDFClassifiers

Paper NDSS16

Slides

Abstract

Machine learning is widely used to develop classifiers for security tasks. However, the robustness of these methods against motivated adversaries is uncertain. In this work, we propose a generic method to evaluate the robustness of classifiers under attack. The key idea is to stochastically manipulate a malicious sample to find a variant that preserves the malicious behavior but is classified as benign by the classifier. We present a general approach to search for evasive variants and report on results from experiments using our techniques against two PDF malware classifiers, PDFrate and Hidost. Our method is able to automatically find evasive variants for both classifiers for all of the 500 malicious seeds in our study. Our results suggest a general method for evaluating classifiers used in security applications, and raise serious doubts about the effectiveness of classifiers based on superficial features in the presence of adversaries.

evadePDF

Citations

@inproceedings{xu2016automatically,
  title={Automatically evading classifiers},
  author={Xu, Weilin and Qi, Yanjun and Evans, David},
  booktitle={Proceedings of the 2016 Network and Distributed Systems Symposium},
  year={2016}
}

Support or Contact

Having troubl with our tools? Please contact Weilin and we’ll help you sort it out.

A Theoretical Framework for Robustness of (Deep) Classifiers Against Adversarial Samples

Paper ICLR17 workshop

Poster

Abstract

Most machine learning classifiers, including deep neural networks, are vulnerable to adversarial examples. Such inputs are typically generated by adding small but purposeful modifications that lead to incorrect outputs while imperceptible to human eyes. The goal of this paper is not to introduce a single method, but to make theoretical steps towards fully understanding adversarial examples. By using concepts from topology, our theoretical analysis brings forth the key reasons why an adversarial example can fool a classifier (f1) and adds its oracle (f2, like human eyes) in such analysis. By investigating the topological relationship between two (pseudo)metric spaces corresponding to predictor f1 and oracle f2, we develop necessary and sufficient conditions that can determine if f1 is always robust (strong-robust) against adversarial examples according to f2. Interestingly our theorems indicate that just one unnecessary feature can make f1 not strong-robust, and the right feature representation learning is the key to getting a classifier that is both accurate and strong-robust.

Recent studies are mostly empirical and provide little understanding of why an adversary can fool machine learning models with adversarial examples. Several important questions have not been answered yet:

  • What makes a classifier always robust to adversarial examples?
  • Which parts of a classifier influence its robustness against adversarial examples more, compared with the rest?
  • What is the relationship between a classifier’s generalization accuracy and its robustness against adversarial examples?
  • Why (many) DNN classifiers are not robust against adversarial examples ? How to improve?

This paper uses the following framework

  • to understand adversarial examples (by considering the role of oracle): oracle

  • The following figure provides a simple case illustration explaining unnecessary features make a classifier vulnerable to adversarial examples: unnecessaryfeatures

  • The following figure tries to explain why DNN models are vulnerable to adversarial examples: unnecessaryfeatures

Citations

@inproceedings{wang2017fast,
  title={A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models},
  author={Wang, Beilun and Gao, Ji and Qi, Yanjun},
  booktitle={Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR:, 2017.},
  volume={54},
  pages={1168--1177},
  year={2017}
}

Support or Contact

Having trouble with our tools? Please contact Beilun and we’ll help you sort it out.