Structure I  Varying DNN structures
19 Sep 2017 2Architecture 8Scalable dialog QA nonparametric structured sparsityPresenter  Papers  Paper URL  Our Slides 

Jack  Learning EndtoEnd GoalOriented Dialog, ICLR17 ^{1}  
Bargav  Nonparametric Neural Networks, ICLR17 ^{2}  
Bargav  Learning Structured Sparsity in Deep Neural Networks, NIPS16 ^{3}  
Arshdeep  Learning the Number of Neurons in Deep Networks, NIPS16 ^{4} 

_{ Learning EndtoEnd GoalOriented Dialog, ICLR17 / Antoine Bordes, YLan Boureau, Jason Weston/ Traditional dialog systems used in goaloriented applications require a lot of domainspecific handcrafting, which hinders scaling up to new domains. Endtoend dialog systems, in which all components are trained from the dialogs themselves, escape this limitation. But the encouraging success recently obtained in chitchat dialog may not carry over to goaloriented settings. This paper proposes a testbed to break down the strengths and shortcomings of endtoend dialog systems in goaloriented applications. Set in the context of restaurant reservation, our tasks require manipulating sentences and symbols, so as to properly conduct conversations, issue API calls and use the outputs of such calls. We show that an endtoend dialog system based on Memory Networks can reach promising, yet imperfect, performance and learn to perform nontrivial operations. We confirm those results by comparing our system to a handcrafted slotfilling baseline on data from the second Dialog State Tracking Challenge (Henderson et al., 2014a). We show similar result patterns on data extracted from an online concierge service. } ↩

_{ Nonparametric Neural Networks, ICLR17 / George Philipp, Jaime G. Carbonell/ Automatically determining the optimal size of a neural network for a given task without prior information currently requires an expensive global search and training many networks from scratch. In this paper, we address the problem of automatically finding a good network size during a single training cycle. We introduce nonparametric neural networks, a nonprobabilistic framework for conducting optimization over all possible network sizes and prove its soundness when network growth is limited via an L_p penalty. We train networks under this framework by continuously adding new units while eliminating redundant units via an L_2 penalty. We employ a novel optimization algorithm, which we term adaptive radialangular gradient descent or AdaRad, and obtain promising results. } ↩

_{ Learning Structured Sparsity in Deep Neural Networks, NIPS16/ High demand for computation resources severely hinders deployment of largescale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardwarefriendly structured sparsity of DNN to efficiently accelerate the DNN’s evaluation. Experimental results show that SSL achieves on average 5.1 × and 3.1 × speedups of convolutional layer computation of AlexNet against CPU and GPU, respectively, with offtheshelf libraries. These speedups are about twice speedups of nonstructured sparsity; (3) regularize the DNN structure to improve classification accuracy. The results show that for CIFAR10, regularization on layer depth reduces a 20layer Deep Residual Network (ResNet) to 18 layers while improves the accuracy from 91.25% to 92.60%, which is still higher than that of original ResNet with 32 layers. For AlexNet, SSL reduces the error by ~ 1%. } ↩

_{ Learning the Number of Neurons in Deep Networks, NIPS16 / Nowadays, the number of layers and of neurons in each layer of a deep network are typically set manually. While very deep and wide networks have proven effective in general, they come at a high memory and computation cost, thus making them impractical for constrained platforms. These networks, however, are known to have many redundant parameters, and could thus, in principle, be replaced by more compact architectures. In this paper, we introduce an approach to automatically determining the number of neurons in each layer of a deep network during learning. To this end, we propose to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron. Starting from an overcomplete network, we show that our approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy. } ↩