The website securemachinelearning.org introduces updates of a suite of tools we have designed for making machine learning secure and robust. Feel free to submit pull requests when you find my typos. At the junction between machine learning and computer security, this project involves toolboxes for five main tasks (organized as entries in the navigation menu).

# Blog Posts

### Abstract

#### Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning

Recent studies have shown that attackers can force deep learning models to misclassify so-called “adversarial examples”: maliciously generated images formed by making imperceptible modifications to pixel values. With growing interest in deep learning for security applications, it is important for security experts and users of machine learning to recognize how learning systems may be attacked. Due to the complex nature of deep learning, it is challenging to understand how deep models can be fooled by adversarial examples. Thus, we present a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a convolutional neural network (CNN) system. Adversarial-Playground is educational, modular and interactive. (1) It enables non-experts to compare examples visually and to understand why an adversarial example can fool a CNN-based image classifier. (2) It can help security experts explore more vulnerability of deep learning as a software module. (3) Building an interactive visualization is challenging in this domain due to the large feature space of image classification (generating adversarial examples is slow in general and visualizing images are costly). Through multiple novel design choices, our tool can provide fast and accurate responses to user requests. Empirically, we find that our client-server division strategy reduced the response time by an average of 1.5 seconds per sample. Our other innovation, a faster variant of JSMA evasion algorithm, empirically performed twice as fast as JSMA and yet maintains a comparable evasion rate. Project source code and data from our experiments available at: GitHub

### Citations

@article{norton2017advplayground,
author={Norton, Andrew and Qi, Yanjun},
url = {http://arxiv.org/abs/1708.00807}
year={2017},
}


# securemachinelearning.org is up and running!

The website securemachinelearning.org introduces updates of a suite of tools we have developed for making machine learning secure and robust.

## Scope of problems our tools aim to tackle

Classifiers based on machine learning algorithms have shown promising results for many security tasks including malware classification and network intrusion detection, but classic machine learning algorithms are not designed to operate in the presence of adversaries. Intelligent and adaptive adversaries may actively manipulate the information they present in attempts to evade a trained classifier, leading to a competition between the designers of learning systems and attackers who wish to evade them. This project is developing automated techniques for predicting how well classifiers will resist the evasions of adversaries, along with general methods to automatically harden machine-learning classifiers against adversarial evasion attacks.

At the junction between machine learning and computer security, this project involves toolboxes for five main task as shown in the following table. Our system aims to allow a classifier designer to understand how the classification performance of a model degrades under evasion attacks, enabling better-informed and more secure design choices. The framework is general and scalable, and takes advantage of the latest advances in machine learning and computer security.

No. Tool Name Short Description
2 Detect Adversarial Attacks Tools we designed for Detecting Adversarial Examples in Deep Neural Networks
3 Defense against Adversarial Attacks Tools we designed for defending against Adversarial Examples in Deep Neural Networks
5 Theorems of Adversarial Machine Learning Theorems we proposed for understanding Adversarial Examples in Machine Learning

## Contact

Have questions or suggestions? Feel free to ask me on Twitter or email me.

# A Toolbox for Visualizing Adversarial Examples

### Abstract

With growing interest in adversarial machine learning, it is important for machine learning practitioners and users to understand how their models may be attacked. We propose a web-based visualization tool, \textit{Adversarial-Playground}, to demonstrate the efficacy of common adversarial methods against a deep neural network (DNN) model, built on top of the TensorFlow library. Adversarial-Playground provides users an efficient and effective experience in exploring techniques generating adversarial examples, which are inputs crafted by an adversary to fool a machine learning system. To enable Adversarial-Playground to generate quick and accurate responses for users, we use two primary tactics: (1) We propose a faster variant of the state-of-the-art Jacobian saliency map approach that maintains a comparable evasion rate. (2) Our visualization does not transmit the generated adversarial images to the client, but rather only the matrix describing the sample and the vector representing classification likelihoods.

### Citations

@article{norton2017advplayground,
author={Norton, Andrew and Qi, Yanjun},
url = {http://arxiv.org/abs/1706.01763}
year={2017},
}


# DeepCloak- a tool for Automatically Defending DNN against Adversarial Examples

## DeepCloak: Masking Deep Neural Network Models for Robustness against Adversarial Samples

### Abstract

Recent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in security-sensitive settings. It was observed that an adversary could easily generate adversarial samples by making a small perturbation on irrelevant feature dimensions that are unnecessary for the current classification task. To overcome this problem, we introduce a defensive mechanism called DeepCloak. By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs. Comparing with other defensive approaches, DeepCloak is easy to implement and computationally efficient. Experimental results show that DeepCloak can increase the performance of state-of-the-art DNN models against adversarial samples.

### Citations

@article{gao2017deepmask,
author={Gao, Ji and Wang, Beilun and Qi, Yanjun},
journal={arXiv preprint arXiv:1702.06763},
year={2017}
}


# FeatureSqueezing - a suite of tools for Detecting Adversarial Examples in Deep Neural Networks

## Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

### Abstract

Although deep neural networks (DNNs) have achieved great success in many computer vision tasks, recent studies have shown they are vulnerable to adversarial examples. Such examples, typically generated by adding small but purposeful distortions, can frequently fool DNN models. Previous studies to defend against adversarial examples mostly focused on refining the DNN models. They have either shown limited success or suffer from the expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model’s prediction on the original input with that on the squeezed input, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two instances of feature squeezing: reducing the color bit depth of each pixel and smoothing using a spatial filter. These strategies are straightforward, inexpensive, and complementary to defensive methods that operate on the underlying model, such as adversarial training.

### Citations

@article{xu2017feature,
title={Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks},
author={Xu, Weilin and Evans, David and Qi, Yanjun},
journal={arXiv preprint arXiv:1704.01155},
year={2017}
}


# A Tool for Automatically Evading Classifiers for PDF Malware detection

## Automatically Evading Classifiers for Detecting PDF Malware

By using evolutionary techniques to simulate an adversary’s efforts to evade that classifier

### Abstract

Machine learning is widely used to develop classifiers for security tasks. However, the robustness of these methods against motivated adversaries is uncertain. In this work, we propose a generic method to evaluate the robustness of classifiers under attack. The key idea is to stochastically manipulate a malicious sample to find a variant that preserves the malicious behavior but is classified as benign by the classifier. We present a general approach to search for evasive variants and report on results from experiments using our techniques against two PDF malware classifiers, PDFrate and Hidost. Our method is able to automatically find evasive variants for both classifiers for all of the 500 malicious seeds in our study. Our results suggest a general method for evaluating classifiers used in security applications, and raise serious doubts about the effectiveness of classifiers based on superficial features in the presence of adversaries.

### Citations

@inproceedings{xu2016automatically,
author={Xu, Weilin and Qi, Yanjun and Evans, David},
booktitle={Proceedings of the 2016 Network and Distributed Systems Symposium},
year={2016}
}


# Understand adversarial examples (by considering the role of oracle of the task)

## A Theoretical Framework for Robustness of (Deep) Classifiers Against Adversarial Samples

### Abstract

Most machine learning classifiers, including deep neural networks, are vulnerable to adversarial examples. Such inputs are typically generated by adding small but purposeful modifications that lead to incorrect outputs while imperceptible to human eyes. The goal of this paper is not to introduce a single method, but to make theoretical steps towards fully understanding adversarial examples. By using concepts from topology, our theoretical analysis brings forth the key reasons why an adversarial example can fool a classifier (f1) and adds its oracle (f2, like human eyes) in such analysis. By investigating the topological relationship between two (pseudo)metric spaces corresponding to predictor f1 and oracle f2, we develop necessary and sufficient conditions that can determine if f1 is always robust (strong-robust) against adversarial examples according to f2. Interestingly our theorems indicate that just one unnecessary feature can make f1 not strong-robust, and the right feature representation learning is the key to getting a classifier that is both accurate and strong-robust.

Recent studies are mostly empirical and provide little understanding of why an adversary can fool machine learning models with adversarial examples. Several important questions have not been answered yet:

• What makes a classifier always robust to adversarial examples?
• Which parts of a classifier influence its robustness against adversarial examples more, compared with the rest?
• What is the relationship between a classifier’s generalization accuracy and its robustness against adversarial examples?
• Why (many) DNN classifiers are not robust against adversarial examples ? How to improve?

This paper uses the following framework

• to understand adversarial examples (by considering the role of oracle):

• The following figure provides a simple case illustration explaining unnecessary features make a classifier vulnerable to adversarial examples:

• The following figure tries to explain why DNN models are vulnerable to adversarial examples:

### Citations

@inproceedings{wang2017fast,
title={A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models},
author={Wang, Beilun and Gao, Ji and Qi, Yanjun},
booktitle={Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR:, 2017.},
volume={54},
pages={1168--1177},
year={2017}
}