Structure VI - DNN with Adaptive Structures

2Architecture 8Scalable dynamic Architecture Search structured
Presenter Papers Paper URL Our Slides
Anant AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17 1 PDF PDF
Shijia SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, ICML17 2 PDF PDF
Jack Proximal Deep Structured Models, NIPS16 3 PDF PDF
  Optimal Architectures in a Solvable Model of Deep Networks, NIPS16 4 PDF  
Tianlu Large-Scale Evolution of Image Classifiers, ICML17 5 PDF PDF
  1. Large-Scale Evolution of Image Classifiers, ICML17/ Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc Le, Alex Kurakin / Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone. Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. Despite significant computational requirements, we show that it is now possible to evolve models with accuracies within the range of those published in the last year. Specifically, we employ simple evolutionary techniques at unprecedented scales to discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial initial conditions and reaching accuracies of 94.6% (95.6% for ensemble) and 77.0%, respectively. To do this, we use novel and intuitive mutation operators that navigate large search spaces; we stress that no human participation is required once evolution starts and that the output is a fully-trained model. Throughout this work, we place special emphasis on the repeatability of results, the variability in the outcomes and the computational requirements.  

  2. AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17 / Corinna Cortes, et al. We present new algorithms for adaptively learning artificial neural networks. Our algorithms (AdaNet) adaptively learn both the structure of the network and its weights. They are based on a solid theoretical analysis, including data-dependent generalization guarantees that we prove and discuss in detail. We report the results of large-scale experiments with one of our algorithms on several binary classification tasks extracted from the CIFAR-10 dataset. The results demonstrate that our algorithm can automatically learn network structures with very competitive performance accuracies when compared with those achieved for neural networks found by standard approaches.  

  3. SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, ICML17 / We propose a novel deep neural network that is both lightweight and effectively structured for model parallelization. Our network, which we name as SplitNet, automatically learns to split the network weights into either a set or a hierarchy of multiple groups that use disjoint sets of features, by learning both the class-to-group and feature-to-group assignment matrices along with the network weights. This produces a tree-structured network that involves no connection between branched subtrees of semantically disparate class groups. SplitNet thus greatly reduces the number of parameters and requires significantly less computations, and is also embarrassingly model parallelizable at test time, since the network evaluation for each subnetwork is completely independent except for the shared lower layer weights that can be duplicated over multiple processors. We validate our method with two deep network models (ResNet and AlexNet) on two different datasets (CIFAR-100 and ILSVRC 2012) for image classification, on which our method obtains networks with significantly reduced number of parameters while achieving comparable or superior classification accuracies over original full deep networks, and accelerated test speed with multiple GPUs.  

  4. Proximal Deep Structured Models, NIPS16 / Raquel Urtasun et al. Many problems in real-world applications involve predicting continuous-valued random variables that are statistically related. In this paper, we propose a powerful deep structured model that is able to learn complex non-linear functions which encode the dependencies between continuous output variables. We show that inference in our model using proximal methods can be efficiently solved as a feedfoward pass of a special type of deep recurrent neural network. We demonstrate the effectiveness of our approach in the tasks of image denoising, depth refinement and optical flow estimation.  

  5. Optimal architectures in a solvable model of Deep networks, NIPS16/ Deep neural networks have received a considerable attention due to the success of their training for real world machine learning applications. They are also of great interest to the understanding of sensory processing in cortical sensory hierarchies. The purpose of this work is to advance our theoretical understanding of the computational benefits of these architectures. Using a simple model of clustered noisy inputs and a simple learning rule, we provide analytically derived recursion relations describing the propagation of the signals along the deep network. By analysis of these equations, and defining performance measures, we show that these model networks have optimal depths. We further explore the dependence of the optimal architecture on the system parameters.