Reliable Applications V  Understanding2
11 Oct 2017 3Reliable visualizing DifferenceAnalysis Attribution CompositionPresenter  Papers  Paper URL  Our Slides 

Rita  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17 ^{1}  
Arshdeep  Axiomatic Attribution for Deep Networks, ICML17 ^{2}  
The Robustness of Estimator Composition, NIPS16 

_{ Axiomatic Attribution for Deep Networks, ICML17 / Google / We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better. } ↩

_{ Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16 / GoogleBrain / We develop a general duality between neural networks and compositional kernel Hilbert spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly describes both a family of neural networks and a kernel space. Random neural networks are generated from a skeleton through node replication followed by sampling from a normal distribution to assign weights. The kernel space consists of functions that arise by compositions, averaging, and nonlinear transformations governed by the skeleton’s graph topology and activation functions. We prove that random networks induce representations which approximate the kernel space. In particular, it follows that random weight initialization often yields a favorable starting point for optimization despite the worstcase intractability of training neural networks. } ↩