1 minute read

Presenter Papers Paper URL Our Slides
AE Intriguing properties of neural networks / PDF  
AE Explaining and Harnessing Adversarial Examples PDF  
AE Towards Deep Learning Models Resistant to Adversarial Attacks PDF  
AE DeepFool: a simple and accurate method to fool deep neural networks PDF  
AE Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner PDF PDF
Data Basic Survey of ImageNet - LSVRC competition URL PDF
Understand Understanding Black-box Predictions via Influence Functions PDF  
Understand Deep inside convolutional networks: Visualising image classification models and saliency maps PDF  
Understand BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1] PDF  
provable Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter, URL  

[^1] Notes about Interpretable Machine Learning

Notes of Interpretability in Machine Learning from Been Kim Tutorial

by Brandon Liu

Important Criteria in ML Systems
  • Safety
  • Nondiscrimination
  • Avoiding technical debt
  • Providing the right to explanation
  • Ex. Self driving cars and other autonomous vehicles - almost impossible to come up with all possible unit tests.
What is interpretability?
  • The ability to give explanations to humans.
Two Branches of Interpretability
  • In the context of an application: if the system is useful in either a practical application or a simplified version of it, then it must be somehow interpretable.
  • Via a quantifiable proxy: a researcher might first claim that some model class—e.g. sparse linear models, rule lists, gradient boosted trees—are interpretable and then present algorithms to optimize within that class.
Before building any model
  • Visualization
  • Exploratory data analysis
Building a new model
  • Rule-based, per-feature-based
  • Case-based
  • Sparsity
  • Monotonicity
After building a model
  • Sensitivity analysis, gradient-based methods
  • mimic/surrogate models
  • Investigation on hidden layers