Potential Reading List
About this potential to read list :

To educate my students in class, new members of my team with basic tutorials, and to help existing members understand advanced topics. This website includes a (growing) list of tutorials and papers we survey for such a purpose (Since 2017).

At the beginning of each semester, I collect a messy list of potential readings and put them here. Then my students will choose papers they want to review (mostly from this list) and we make a plan for that semester’s reading session schedule.

In summary, this is a messy list, only for planning and filtering purposes.
 Topic I: Foundations, Analysis and Theory
 Topic II: DNN with Varying Structures
 Topic III: Reliable and Benchmarking and Applications
 Topic IV: Optimization
 Topic V: Generative
 Topic VI: Reinforcement
 Topic VII: Graphs
 Topic VIII: 2019 Learning Strategies
Potential DeepLearningPapers provided to my Course Students to reproduce in 2019Fall course
Potential DeepLearningPapersReadingforGraphs we read in 2019Spring
 GNN code repos: https://paperswithcode.com/task/graphembedding
 Similar course: https://www.math.uwaterloo.ca/~bico/co759/2018/index.html
Basics:
 GraphSAGE / GatedGNN /
 ChebNet, Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 Relational inductive biases, deep learning, and graph networks, et al, Oriol Vinyals, Yujia Li, Razvan Pascanu, 2018
 Graph Neural Networks: A Review of Methods and Applications https://arxiv.org/pdf/1812.08434.pdf
 Modeling relational data with graph convolutional networks, 2017, Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling
 An Experimental Study of Neural Networks for Variable Graphs, workshop 2018 ICLR
 How Powerful are Graph Neural Networks? / Keyulu Xu, Weihua Hu, Jure Leskovec, Stefanie Jegelka, 2018
 A Comprehensive Survey on Graph Neural Networks 2018, https://arxiv.org/pdf/1901.00596.pdf
 Deeper Insights into Graph Convolutional Networks for SemiSupervised Learning Qimai Li, Zhichao Han, XiaoMing Wu,
 K Xu, W Hu, J Leskovec, S Jegelka  arXiv preprint arXiv:1810.00826, 2018 Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
 Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016.
 SemiSupervised Classification with Graph Convolutional Networks Authors: Thomas N. Kipf, Max Welling
 Graph Attention Networks, Authors: Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
 Learning Convolutional Neural Networks for Graphs, http://proceedings.mlr.press/v48/niepert16.pdf
 Inductive representation learning on large graphs, NIPS16
 Higherorder clustering in networks, H Yin, AR Benson, J Leskovec, Physical Review E 97 (5), 052306 PDF
Basic graph represenation learning:
 RECS: Robust Graph Embedding Using Connection Subgraphs
 LASAGNE: Locality And Structure Aware Graph Node Embedding
 Adversarially Regularized Graph Autoencoder for Graph Embedding
 All Graphs Lead to Rome: Learning Geometric and CycleConsistent Representations with Graph Convolutional Networks
 LanczosNet: MultiScale Deep Graph Convolutional Networks
 Graph Neural Networks with convolutional ARMA filters
 Geniepath: Graph neural networks with adaptive receptive paths Z Liu, C Chen, L Li, J Zhou, X Li, L Song, Y Qi arXiv preprint arXiv:1802.00910
 Link Prediction Based on Graph Neural Networks arXiv:1802.09691
 Deep Graph Infomax, P Veličković, W Fedus, WL Hamilton, P Liò, Y Bengio…  arXiv preprint arXiv 2018
 ICML18, Anonymous Walk Embeddings, Authors: Sergey Ivanov, Evgeny Burnaev
 Geometric Matrix Completion with Recurrent MultiGraph Neural Networks Authors: Federico Monti, Michael Bronstein, Xavier Bresson
 Diffusionconvolutional neural networks, NeuroIPS16
 Convolutional networks on graphs for learning molecular fingerprints, NeuroIPS15
 Geometric deep learning: going beyond euclidean data, 2017
 Dynamic graph cnn for learning on point clouds, 2018
GNN extend/beyond:
 GMPLL: Graph Matching based Partial Label Learning
 Graph Matching Networks for Learning the Similarity of Graph Structured Objects, 2019
 A Functional Representation for Graph Matching
 Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text
 Sample Efficient Semantic Segmentation using Rotation Equivariant Convolutional Networks J Linmans, J Winkens, BS Veeling, TS Cohen, M Welling arXiv preprint arXiv:1807.00583
 2018, Rotation Equivariant CNNs for Digital Pathology BS Veeling, J Linmans, J Winkens, T Cohen, M Welling arXiv preprint arXiv:1806.03962
 Emerging Convolutions for Generative Normalizing Flows E Hoogeboom, R Berg, M Welling, arXiv preprint arXiv:1901.11137
 3d steerable cnns: Learning rotationally equivariant features in volumetric data M Weiler, M Geiger, M Welling, W Boomsma, T Cohen Advances in Neural Information Processing Systems, 1040210413
 Convolutional networks for spherical signals T Cohen, M Geiger, J Köhler, M Welling arXiv preprint arXiv:1709.04893
 Graph Convolutional Matrix Completion R van den Berg, TN Kipf, M Welling stat 1050, 7
 Relaxed Quantization for Discretized Neural Networks
 Probabilistic Binary Neural Networks, JWT Peters, M Welling arXiv preprint arXiv:1809.03368
 Value Propagation for Decentralized Networked Deep Multiagent Reinforcement Learning, C Qu, S Mannor, H Xu, Y Qi, L Song, J Xiong, arXiv preprint arXiv:1901.09326
 Double Neural Counterfactual Regret Minimization H Li, K Hu, Z Ge, T Jiang, Y Qi, L Song arXiv preprint arXiv:1812.10607 2018
 Neural ModelBased Reinforcement Learning for Recommendation X Chen, S Li, H Li, S Jiang, Y Qi, L Song arXiv preprint arXiv:1812.10613
 Deep hyperspherical learning W Liu, YM Zhang, X Li, Z Yu, B Dai, T Zhao, L Song Advances in Neural Information Processing Systems, 39503960
 Graph Edit Distance Computation via Graph Neural Networks Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, Wei Wang
 Hierarchical Graph Representation Learning with Differentiable Pooling Authors: Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, Jure Leskovec
 FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Authors: Jie Chen, Tengfei Ma, Cao Xiao, Abstract: The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semisupervised learning. Such a model, however, is transductive in nature because parameters are learned through convolutions with both training and test data
 Representation Learning on Graphs with Jumping Knowledge Networks Authors: Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Kenichi Kawarabayashi, Stefanie Jegelka Abstract: Recent deep learning approaches for representation learning on graphs follow a neighborhood aggregation procedure. We analyze some important properties of these models, and propose a strategy to overcome those. In particular, the range of “neighboring”
 Gauge Equivariant Convolutional Networks and the Icosahedral CNN TS Cohen, M Weiler, B Kicanaoglu, M Welling  arXiv preprint arXiv:1902.04615, 2019 The idea of equivariance to symmetry transformations provides one of the first
 Learning Invariant Representations Of Planar Curves Authors: Gautam Pai, Aaron Wetzler, Ron Kimmel
Generate:
 Learning Bayesian Networks is NPComplete by DM Chickering  1996  Cited by 1069
 Neural scene representation and rendering, science 2018
 Relational Deep Reinforcement Learning, 2018
 Generating sentences from a continuous space, 2015
 Encoding Robust Representation for Graph Generation
 SyntaxDirected Variational Autoencoder for Molecule Generation H Dai, Y Tian, B Dai, S Skiena, L Song, International Conference on Machine Learning
 Graphical Generative Adversarial Networks C Li, M Welling, J Zhu, B Zhang arXiv preprint arXiv:1804.03429
 2019, Recurrent Inference Machines for Reconstructing Heterogeneous MRI Data K Lønning, P Putzky, JJ Sonke, L Reneman, MWA Caan, M Welling
 Deep Reinforcement Learning for NLP, ACL18
 DEFactor: Differentiable Edge Factorizationbased Probabilistic Graph Generation R Assouel, M Ahmed, MH Segler, A Saffari, Y Bengio  arXiv preprint arXiv …, 2018
 Edgeexchangeable graphs and sparsity, NIPS16, Authors: Diana Cai, Trevor Campbell, Tamara Broderick Abstract: Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the AldousHoover theorem guarantees that these graphs are dense or empty with probability one, whereas many realworld graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability,
 Junction Tree Variational Autoencoder for Molecular Graph Generation Authors: Wengong Jin, Regina Barzilay, Tommi Jaakkola
 Towards Variational Generation of Small Graphs Authors: Martin Simonovsky, Nikos Komodakis
 GraphRNN: Generating Realistic Graphs with Deep Autoregressive Models, ICML2018 Authors: Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, Jure Leskovec
 Pixels to Graphs by Associative Embedding Authors: Alejandro Newell, Jia Deng Abstract: Graphs are a useful abstraction of image content. Not only can graphs represent details about individual objects in a scene but they can capture the interactions between pairs of objects. We present a method for training a convolutional neural network such that it takes in an input image and produces a full graph definition.
 SyntaxDirected Variational Autoencoder for Structured Data Authors: Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, Le Song
 NetGAN: Generating Graphs via Random Walks, ICML2018 Authors: Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, Stephan Günnemann
 Graphons, mergeons, and so on! Authors: Justin Eldridge, Mikhail Belkin, Yusu Wang Abstract: In this work we develop a theory of hierarchical clustering for graphs. Our modelling assumption is that graphs are sampled from a graphon, which is a powerful and general model for generating graphs and analyzing large networks.
 Convolutional Imputation of Matrix Networks Authors: Qingyun Sun, Mengyuan Yan, David Donoho, boyd
with GM:
 Neural Graph Machines: Learning Neural Networks Using Graphs
 Graph HyperNetworks for Neural Architecture Search
 MRF Optimization by Graph Approximation
 Credit Assignment Techniques in Stochastic Computation Graphs
 Graph Refinement based Tree Extraction using MeanField Networks and Graph Neural Networks, R Selvan, T Kipf, M Welling, JH Pedersen, J Petersen, M de Bruijne arXiv preprint arXiv:1811.08674
 SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
 Combinatorial Bayesian Optimization using Graph Representations C Oh, JM Tomczak, E Gavves, M Welling arXiv preprint arXiv:1902.00448
 Learning SteadyStates of Iterative Algorithms over Graphs H Dai, Z Kozareva, B Dai, A Smola, L Song International Conference on Machine Learning, 11141122
 A Hilbert space embedding for distributions. In Proceedings of the International Conference on Algorithmic Learning Theory, volume 4754, pp. 13–31. Springer, 2007.
 Hilbert space embeddings of conditional distributions. In Proceedings of the International Conference on Machine Learning, 2009.
 Nonparametric tree graphical models. In 13th Workshop on Artificial Intelligence and Statistics, volume 9 of JMLR workshop and conference proceedings, pp. 765–772, 2010
 Kernel belief propagation. In Proc. Intl. Con ference on Artificial Intelligence and Statistics, volume 10 of JMLR workshop and conference proceedings, 2011.
 Injective Hilbert space embeddings of probability measures. In Proceedings of Annual Conference. Computational Learning Theory, pp. 111–122, 2008.
 Jebara, T., Kondor, R., and Howard, A. Probability product kernels. J. Mach. Learn. Res., 5:819–844, 2004.
 Kernelbased justintime learning for passing expectation propagation messages. In Proceedings of the ThirtyFirst Conference on Uncertainty in Artificial Intelligence, UAI 2015, July 1216, 2015, Amsterdam, The Netherlands, pp. 405–414, 2015
 Deeply learning the messages in message passing inference. In Advances in Neural Information Processing Systems, 2015.
 Minka, T. The EP energy function and minimization schemes. See www. stat. cmu. edu/minka/papers/learning. html, August, 2001.
 Contextual Graph Markov Model: A Deep and Generative Approach to Graph Processing Authors: Davide Bacciu, Federico Errica, Alessio Micheli Abstract: We introduce the Contextual Graph Markov Model, an approach combining ideas from generative models and neural networks for the processing of graph data.
 Inference in probabilistic graphical models by Graph Neural Networks Authors: KiJung Yoon, Renjie Liao, Yuwen Xiong, Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard Zemel, Xaq Pitkow Abstract: A useful computation when acting in a complex environment is to infer the marginal probabilities or most probable states of taskrelevant variables.
Applications and more:
 Endtoend differentiable physics for learning and control
 Learning to represent programs with graphs
 KG^ 2: Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings Y Zhang, H Dai, K Toraman, L Song arXiv preprint arXiv:1805.12393
 video2net: Extracting dynamic interaction networks from multiperson discussion videos / https://www.cs.stanford.edu/~srijan/pubs/papervideo2net.pdf
 Theory and Application of Network Biology Towards Precision Medicine
 Attention, Learn to Solve Routing Problems! W Kool, H van Hoof, M Welling
 Extraction of Airways using Graph Neural Networks R Selvan, T Kipf, M Welling, JH Pedersen, J Petersen, M de Bruijne arXiv preprint arXiv:1804.04436
 Deep Learning with Permutationinvariant Operator for Multiinstance Histopathology Classification, JM Tomczak, M Ilse, M Welling, arXiv preprint arXiv:1712.00310
 Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape H Dai, R Umarov, H Kuwahara, Y Li, L Song, X Gao Bioinformatics 33 (22), 35753583
 Learning combinatorial optimization algorithms over graphs H Dai, EB Khalil, Y Zhang, B Dilkina, L Song arXiv preprint arXiv:1704.01665
 Neural networkbased graph embedding for crossplatform binary code similarity detection, X Xu, C Liu, Q Feng, H Yin, L Song, D Song Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications …
 Convolutional neural network based on SMILES representation of compounds for detecting chemical motif M Hirohara, Y Saito, Y Koda, K Sato, Y Sakakibara  BMC Bioinformatics, 2018
 Heterogeneous Graph Neural Networks for Malicious Account Detection Z Liu, C Chen, X Yang, J Zhou, X Li, L Song 
 DiffusionBased Approximate Value Functions Authors: Martin Klissarov, Doina Precup
 Mean Field MultiAgent Reinforcement Learning Authors: Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang Abstract: Existing multiagent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions
 Protein–ligand scoring with convolutional neural networks
 Visualizing convolutional neural network proteinligand scoring
 KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3DConvolutional Neural Networks, 2018
 D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies
 Structured sequence modeling with graph convolutional recurrent networks,” arXiv preprint arXiv:1612.07659, 2016.
 Structuralrnn: Deep learning on spatiotemporal graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5308–5317.
 Prioritizing network communities
 Community detection and stochastic block models: recent developments
 Android Malware Detection using Largescale Network Representation Learning + Deep Android Malware Detection Pdf + PDF
Robustness and scalable
 Does Your Model Know the Digit 6 Is Not a Cat? A Less Biased Evaluation of” Outlier” Detectors
 Faithful and Customizable Explanations of Black Box Models H Lakkaraju, E Kamar, R Caruana, J Leskovec  2019
 Adversarial Examples as an InputFault Tolerance Problem
 Adversarial Attack on Graph Structured Data https://arxiv.org/abs/1806.02371
 Adversarial Attacks on Neural Networks for Graph Data, https://dl.acm.org/citation.cfm?id=3220078 (edited)
 Android Malware Detection using Largescale Network Representation Learning, https://arxiv.org/abs/1806.04847
 “Deep Program Reidentification: A Graph Neural Network Solution” https://arxiv.org/abs/1812.04064
 Heterogeneous Graph Neural Networks for Malicious Account Detection Z Liu, C Chen, X Yang, J Zhou, X Li, L Song Proceedings of the 27th ACM International Conference on Information and …
 LShapley and CShapley: Efficient model interpretation for structured data J Chen, L Song, MJ Wainwright, MI Jordan arXiv preprint arXiv:1808.02610
 Stochastic Training of Graph Convolutional Networks with Variance Reduction Authors: Jianfei Chen, Jun Zhu, Le Song
 A causal framework for explaining the predictions of blackbox sequencetosequence models, EMNLP17
 Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs, Daniel Neil, Joss Briody, Alix Lacoste, Aaron Sim, Paidi Creed, Amir Saffari
 Interpretable Convolutional Neural Networks Quanshi Zhang, Ying Nian Wu, SongChun Zhu
 Towards Efficient LargeScale Graph Neural Network Computing Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
 Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
 DeepX: A Software Accelerator for LowPower Deep Learning Inference on Mobile Devices
 Squeezing deep learning into mobile and embedded devices, ND Lane, S Bhattacharya, A Mathur, P Georgiev, C Forlivesi, F Kawsar
 Towards Efficient LargeScale Graph Neural Network Computing, Lingxiao Ma†∗, Zhi Yang†∗, Youshan Miao‡, Jilong Xue‡, Ming Wu‡, Lidong Zhou‡, Yafei Dai, https://arxiv.org/pdf/1810.08403.pdf
 Cavs: An Efficient Runtime System for Dynamic Neural Networks 1,2Shizhen Xu†, 1,3Hao Zhang†, 1,3Graham Neubig, 3Wei Dai, 1Jin Kyu Kim, 2Zhijie Deng, 3Qirong Ho, 2Guangwen Yang, 3Eric P. Xing
 A Comparison of Distributed Machine Learning Platforms (2017)
 GeePS: scalable deep learning on distributed GPUs with a GPUspecialized parameter server (2016)
 AMPNet: Asynchronous ModelParallel Training for Dynamic Neural Networks (2017)
 GraphLab / GraphX / Pregel
 Demystifying Parallel and Distributed Deep Learning: An InDepth Concurrency Analysis
 Towards Efficient LargeScale Graph Neural Network Computing (2018)
 The HighDimensional Geometry of Binary Neural Networks Authors: Alexander G. Anderson, Cory P. Berg
 Learning Discrete Weights Using the Local Reparameterization Trick
 SparselyConnected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks
 Espresso: Efficient Forward Propagation for Binary Deep Neural Networks
 GkmExplain https://github.com/kundajelab/gkmexplain
DeepLearningPapersReadingRoadmap we read in Fall2017

stateoftheartresultformachinelearningproblems URL
Foundations
 DeepLearningSummerSchool17 + videolectures
 Andrew Ng  Nuts and Bolts of Applying Deep Learning : https://www.youtube.com/watch?v=F1ka6a13S9I :
 Ganguli  Theoretical Neuroscience and Deep Learning DLSS16 http://videolectures.net/deeplearning2016_ganguli_theoretical_neuroscience/
 Ganguli  Theoretical Neuroscience and Deep Learning.pdf DLSS17 https://drive.google.com/file/d/0B6NHiPcsmak1dkZMbzc2YWRuaGM/view
 Sharp Minima Can Generalize For Deep Nets, Laurent Dinh (Univ. Montreal), Razvan Pascanu, Samy Bengio (Google Brain), Yoshua Bengio (Univ. Montreal)
 Automated Curriculum Learning for Neural Networks, Alex Graves, Marc G. Bellemare, Jacob Menick, Koray Kavukcuoglu, Remi Munos
 Learning to learn without gradient descent by gradient descent, Yutian Chen, Matthew Hoffman, Sergio Gomez, Misha Denil, Timothy Lillicrap, Matthew Botvinick , Nando de Freitas
 Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study, Samuel Ritter, David Barrett, Adam Santoro, Matt Botvinick
 Geometry of Neural Network Loss Surfaces via Random Matrix Theory, Jeffrey Pennington, Yasaman Bahri
 On the Expressive Power of Deep Neural Networks, Maithra Raghu, Ben Poole, Surya Ganguli, Jon Kleinberg, Jascha SohlDickstein
 NeuroscienceInspired Artificial Intelligence, http://www.cell.com/neuron/fulltext/S08966273(17)305093
 Understanding deep learning requires rethinking generalization, ICLR17
 On LargeBatch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR17
 Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes, ICLR17
 Capacity and Trainability in Recurrent Neural Networks, ICLR17
 Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations, ICLR17
 Frustratingly Short Attention Spans in Neural Language Modeling, ICLR17
 Topology and Geometry of HalfRectified Network Optimization, ICLR17
 Central Moment Discrepancy (CMD) for DomainInvariant Representation Learning, ICLR17
 Adversarial Feature Learning, ICLR17
 Do Deep Convolutional Nets Really Need to be Deep and Convolutional?, ICLR17
 Why Deep Neural Networks for Function Approximation?, ICLR17
 Bengio  Recurrent Neural Networks  DLSS 2017.pdf: https://drive.google.com/file/d/0ByUKRdiCDK7LXZkM3hVSzFGTkE/view
 On the Expressive Power of Deep Neural Networks, Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha SohlDickstein ; PMLR 70:28472854
 Equivariance Through ParameterSharing, Siamak Ravanbakhsh, Jeff Schneider, Barnabás Póczos ; PMLR 70:28922901
 LargeScale Evolution of Image Classifiers, Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V. Le, Alexey Kurakin ; PMLR 70:29022911
 DepthWidth Tradeoffs in Approximating Natural Functions with Neural Networks, Itay Safran, Ohad Shamir ; PMLR 70:29792987
 A Closer Look at Memorization in Deep Networks, ICML17
 Dynamic Word Embeddings, ICML17
 Combining LowDensity Separators with CNNs, YuXiong Wang*, Carnegie Mellon University; Martial Hebert, Carnegie Mellon University, NIPS16
 CNNpack: Packing Convolutional Neural Networks in the Frequency Domain, NIPS16
 Residual Networks are Exponential Ensembles of Relatively Shallow Networks, NIPS16
 Dense Associative Memory for Pattern Recognition, NIPS16
 Learning Kernels with Random Features, Aman Sinha*, Stanford University; John Duchi,
 Simple and Efficient Weighted Minwise Hashing, NIPS16
 Reward Augmented Maximum Likelihood for Neural Structured Prediction
 Unimodal Probability Distributions for Deep Ordinal Classification, ICML17
 EndtoEnd Learning for Structured Prediction Energy Networks, ICML17
 Orthogonal Random Features, NIPS16
 Learning Structured Sparsity in Deep Neural Networks, NIPS16
 Learning the Number of Neurons in Deep Networks, NIPS16
 Quantized Random Projections and NonLinear Estimation of Cosine Similarity, NIPS16
 An equivalence between high dimensional Bayes optimal inference and Mestimation, NIPS16
 High Dimensional Structured Superposition Models, NIPS16
 Learning Deep Embeddings with Histogram Loss, NIPS16
 Learning values across many orders of magnitude, NIPS16
 Learning Deep Parsimonious Representations, NIPS16
 Efficient HighOrder InteractionAware Feature Selection Based on Conditional Mutual Information, NIPS16
 A Bayesian method for reducing bias in neural representational similarity analysis, NIPS16
 Richards  Deep_Learning_in_the_Brain.pd https://drive.google.com/file/d/0B2A1tnmq5zQdcFNkWU1vdDJiT00/view and https://drive.google.com/file/d/0B2A1tnmq5zQdQWU0Skd6TVVQYUE/view?usp=drive_web
DNN with Varying Structures
 SCAN: Learning Abstract Hierarchical Compositional Visual Concepts, https://arxiv.org/pdf/1707.03389.pdf
 Krueger  Bayesian Hypernetworks.pdf https://drive.google.com/file/d/0B6NHiPcsmak1RUlucW1RN29oS3M/view?usp=drive_web
 Leblond and Alayrac  SeaRNN.pdf https://drive.google.com/file/d/0B6NHiPcsmak1SDVEaWc0OWtaV0k/view?usp=drive_web
 Sharir  Overlapping Architectures.pdf https://drive.google.com/file/d/0B6NHiPcsmak1ZzVkci1EdVN2YkU/view?usp=drive_web
 Ullrich  Bayesian Compression.pd https://drive.google.com/file/d/0B6NHiPcsmak1WlRUeHFpSW5OZGc/view?usp=drive_web
 Understanding Synthetic Gradients and Decoupled Neural Interfaces, Wojtek Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu, ICML17
 Video Pixel Networks, Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu
 AdaNet: Adaptive Structural Learning of Artificial Neural Networks, Corinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang
 Learning to Generate Longterm Future via Hierarchical Prediction, Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee
 ZeroShot Task Generalization with MultiTask Deep Reinforcement Learning, Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli
 Latent LSTM Allocation: Joint Clustering and NonLinear Dynamic Modeling of Sequence Data, Manzil Zaheer, Amr Ahmed, Alex Smola
 LargeScale Evolution of Image Classifiers, Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc Le, Alexey Kurakin
 Sequence Modeling via Segmentations, Chong Wang (Microsoft Research) · Yining Wang (CMU) · PoSen Huang (Microsoft Research) · Abdelrahman Mohammad (Microsoft) · Dengyong Zhou (Microsoft Research) · Li Deng (Citadel)
 ProtoNN: Compressed and Accurate kNN for Resourcescarce Devices
 Adaptive Neural Networks for Fast TestTime Prediction
 Making Neural Programming Architectures Generalize via Recursion, ICLR17
 Optimization as a Model for FewShot Learning, ICLR17
 Learning EndtoEnd GoalOriented Dialog, ICLR17
 Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17
 Nonparametric Neural Networks, ICLR17
 An InformationTheoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17
 Improving Neural Language Models with a Continuous Cache, ICLR17
 Variational Recurrent Adversarial Deep Domain Adaptation, ICLR17
 Soft WeightSharing for Neural Network Compression, ICLR17
 Tracking the World State with Recurrent Entity Networks, (Lecun), ICLR17
 Deep Biaffine Attention for Neural Dependency Parsing, ICLR17
 Learning to Remember Rare Events, ICLR17
 Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks, ICLR17
 Deep Learning with Dynamic Computation Graphs, ICLR17
 QueryReduction Networks for Question Answering, ICLR17
 Bidirectional Attention Flow for Machine Comprehension, ICLR17
 Dynamic Coattention Networks For Question Answering, ICLR17
 Structured Attention Networks, ICLR17
 Outrageously Large Neural Networks: The SparselyGated MixtureofExperts Layer, (Dean), ICLR17
 Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17
 Mollifying Networks, Bengio, ICLR17
 Automatic Rule Extraction from Long Short Term Memory Networks, ICLR17
 Lossaware Binarization of Deep Networks, ICLR17
 Deep Multitask Representation Learning: A Tensor Factorisation Approach, ICLR17
 Towards Deep Interpretability (MUSROVER II): Learning Hierarchical Representations of Tonal Music, ICLR17
 Reasoning with Memory Augmented Neural Networks for Language Comprehension, ICLR17
 SemiSupervised Classification with Graph Convolutional Networks, ICLR17
 Hierarchical Multiscale Recurrent Neural Networks, ICLR17
 AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17
 Language Modeling with Gated Convolutional Networks, ICML17
 ImagetoMarkup Generation with CoarsetoFine Attention, ICML17
 Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17
 Differentiable Programs with Neural Libraries, ICML17
 Convolutional Sequence to Sequence Learning, ICML17
 StateFrequency Memory Recurrent Neural Networks, ICML17
 SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, Juyong Kim, Yookoon Park, Gunhee Kim, Sung Ju Hwang ; PMLR 70:18661874
 Deriving Neural Architectures from Sequence and Graph Kernels Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola ; PMLR 70:20242033
 Delta Networks for Optimized Recurrent Network Computation, Daniel Neil, Jun Haeng Lee, Tobi Delbruck, ShihChii Liu ; PMLR 70:25842593
 Recurrent Highway Networks, Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutnı́k, Jürgen Schmidhuber ; PMLR 70:41894198
 Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, ICML17
 OptNet: Differentiable Optimization as a Layer in Neural Networks, ICML17
 Swapout: Learning an ensemble of deep architectures, Saurabh Singh*, UIUC; Derek Hoiem, UIUC; David Forsyth, UIUC, NIPS16
 NaturalParameter Networks: A Class of Probabilistic Neural Networks, Hao Wang*, HKUST; Xingjian Shi, ; DitYan Yeung, NIPS16
 Learning What and Where to Draw, NIPS16
 Hierarchical QuestionImage CoAttention for Visual Question Answering, NIPS16
 Proximal Deep Structured Models, NIPS16
 Direct Feedback Alignment Provides Learning In Deep Neural Networks, NIPS16
 Scaling MemoryAugmented Neural Networks with Sparse Reads and Writes, NIPS16
 Matching Networks for One Shot Learning, NIPS16
 Can Active Memory Replace Attention? Łukasz Kaiser*, ; Samy Bengio, NIPS16
 Phased LSTM: Accelerating Recurrent Network Training for Long or Eventbased Sequences, NIPS16
 Binarized Neural Networks, NIPS16
 Interaction Networks for Learning about Objects, Relations and Physics, NIPS16
 Optimal Architectures in a Solvable Model of Deep Networks, NIPS16
Reliable and Benchmarking and Applications
 Conditional Image Generation with Pixel CNN Decoders, NIPS16
 Dhruv  Visual Dialog  RLSS 2017 https://drive.google.com/file/d/0BzUSSMdMszk6RndSbkEzcnRFMGs/view and https://drive.google.com/file/d/0BzUSSMdMszk6cDVBMlRqLUs3TFk/view
 Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, Jakob Foerster, Justin Gilmer, Jan Chorowski, Jascha SohlDickstein, David Sussillo
 Axiomatic Attribution for Deep Networks, Ankur Taly, Qiqi Yan,,Mukund Sundararajan
 Differentiable Programs with Neural Libraries, Alex L Gaunt, Marc Brockschmidt, Nate Kushman, Daniel Tarlow
 Neural Optimizer Search with Reinforcement Learning, Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc Le
 Measuring Sample Quality with Kernels, Jackson Gorham (STANFORD) · Lester Mackey (Microsoft Research)
 Learning Continuous Semantic Representations of Symbolic Expressions, ICML17
 Recovery Guarantees for Onehiddenlayer Neural Networks, ICML17
 On the State of the Art of Evaluation in Neural Language Models, https://arxiv.org/abs/1707.05589
 Endtoend Optimized Image Compression, ICLR17
 MultiAgent Cooperation and the Emergence of (Natural) Language, ICLR17
 Semisupervised Knowledge Transfer for Deep Learning from Private Training Data, ICLR17
 Deep Learning with Differential Privacy,
 PrivacyPreserving Deep Learning, CCS15
 Learning to Query, Reason, and Answer Questions On Ambiguous Texts, ICLR17
 Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy, ICLR17
 Data Noising as Smoothing in Neural Network Language Models (Ng), ICLR17
 A Baseline for Detecting Misclassified and OutofDistribution Examples in Neural Networks, ICLR17
 Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17
 On Detecting Adversarial Perturbations, ICLR17
 Delving into Transferable Adversarial Examples and Blackbox Attacks, ICLR17
 Parseval Networks: Improving Robustness to Adversarial Examples, ICML17
 iSurvive: An Interpretable, Eventtime Prediction Model for mHealth, ICML17
 Being Robust (in High Dimensions) Can Be Practical, ICML17
 ModelAgnostic MetaLearning for Fast Adaptation of Deep Networks, ICML17
 On Calibration of Modern Neural Networks, ICML17
 Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17
 Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation, ICML17
 Analogical Inference for Multirelational Embeddings, Hanxiao Liu, Yuexin Wu, Yiming Yang ; PMLR 70:21682178
 Deep Transfer Learning with Joint Adaptation Networks, Mingsheng Long, Han Zhu, Jianmin Wang, Michael I. Jordan ; PMLR 70:22082217
 Sequence to Better Sequence: Continuous Revision of Combinatorial Structures, Jonas Mueller, David Gifford, Tommi Jaakkola ; PMLR 70:25362544
 Meta Networks, Tsendsuren Munkhdalai, Hong Yu ; PMLR 70:25542563
 Geometry of Neural Network Loss Surfaces via Random Matrix Theory, Jeffrey Pennington, Yasaman Bahri ; PMLR 70:27982806
 Asymmetric Tritraining for Unsupervised Domain Adaptation, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada ; PMLR 70:29882997
 Developing BugFree Machine Learning Systems With Formal Mathematics, Daniel Selsam, Percy Liang, David L. Dill ; PMLR 70:30473056
 Learning Important Features Through Propagating Activation Differences, Avanti Shrikumar, Peyton Greenside, Anshul Kundaje ; PMLR 70:31453153
 HighDimensional Structured Quantile Regression, ICML17
 KnowEvolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs, Rakshit Trivedi, Hanjun Dai, Yichen Wang, Le Song ; PMLR 70:34623471
 Learning to Generate Longterm Future via Hierarchical Prediction, Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee ; PMLR 70:35603569
 Sequence Modeling via Segmentations, Chong Wang, Yining Wang, PoSen Huang, Abdelrahman Mohamed, Dengyong Zhou, Li Deng ; PMLR 70:36743683
 A Unified View of MultiLabel Performance Measures, XiZhu Wu, ZhiHua Zhou ; PMLR 70:37803788
 Convexified Convolutional Neural Networks, Yuchen Zhang, Percy Liang, Martin J. Wainwright ; PMLR 70:40444053
 Deep Speech 2 : EndtoEnd Speech Recognition in English and Mandarin, ICML17
 Learning Transferrable Representations for Unsupervised Domain Adaptation, NIPS16
 Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16
 Unsupervised Domain Adaptation with Residual Transfer Networks, Mingsheng Long*, Tsinghua University; Han Zhu, Tsinghua University; Jianmin Wang, Tsinghua University; Michael Jordan, NIPS16
 Interpretable Distribution Features with Maximum Testing Power, Wittawat Jitkrittum*, Gatsby Unit, UCL; Zoltan Szabo, ; Kacper Chwialkowski, Gatsby Unit, UCL; Arthur Gretton, NIPS16
 Domain Separation Networks, NIPS16
 Multimodal Residual Learning for Visual QA, NIPS16
 Learning feedforward oneshot learners, NIPS16
 Adversarial Multiclass Classification: A Risk Minimization Perspective, NIPS16
 Generating Images with Perceptual Similarity Metrics based on Deep Networks, NIPS16
 Dialogbased Language Learning, Jason Weston*, NIPS16
 The Robustness of Estimator Composition, NIPS16
 Large Margin Discriminant Dimensionality Reduction in Prediction Space, NIPS16
 Robustness of classifiers: from adversarial to random noise, NIPS16
 Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16
 Blind Attacks on Machine Learners, Alex Beatson*, Princeton University; Zhaoran Wang, Princeton University; Han Liu, NIPS16
 Composing graphical models with neural networks for structured representations and fast inference, NIPS16
 Spatiotemporal Residual Networks for Video Action Recognition, NIPS16
 Learning Important Features Through Propagating Activation Differences, ICML17
Optimization
 Johnson  Automatic Differentiation.p https://drive.google.com/file/d/0B6NHiPcsmak1ckYxR2hmRGdzdFk/view
 Osborne  Probabilistic numerics for deep learning  DLSS 2017.pdf https://drive.google.com/file/d/0B2A1tnmq5zQdWHBYOFctNi1KdVU/view
 Learned Optimizers that Scale and Generalize, Olga Wichrowska, Niru Maheswaranathan, Matthew Hoffman, Sergio Gomez, Misha Denil, Nando de Freitas, Jascha SohlDickstein
 Learning to learn by gradient descent by gradient descent
 Asynchronous Stochastic Gradient Descent with Delay Compensation,
 How to Escape Saddle Points Efficiently, Chi Jin (UC Berkeley) · Rong Ge (Duke University) · Praneeth Netrapalli (Microsoft Research) · Sham M. Kakade (University of Washington) · Michael Jordan (UC Berkeley)
 Natasha: Faster NonConvex Stochastic Optimization Via Strongly NonConvex Parameter
 Batched Highdimensional Bayesian Optimization via Structural Kernel Learning
 Towards Principled Methods for Training Generative Adversarial Networks, ICLR17
 Optimization as a Model for FewShot Learning, ICLR17
 Amortised MAP Inference for Image Superresolution, ICLR17
 Neural Architecture Search with Reinforcement Learning, ICLR17
 Distributed SecondOrder Optimization using KroneckerFactored Approximations, ICLR17
 Mode Regularized Generative Adversarial Networks, ICLR17
 Highway and Residual Networks learn Unrolled Iterative Estimation, ICLR17
 Snapshot Ensembles: Train 1, Get M for Free, ICLR17
 Learning to Optimize, ICLR17
 Recurrent Batch Normalization, ICLR17
 Adversarially Learned Inference, ICLR17
 Reasoning with Memory Augmented Neural Networks for Language Comprehension, ICLR17
 Deep ADMMNet for Compressive Sensing MRI, NIPS16
 Sharp Minima Can Generalize For Deep Nets, ICML17
 Forward and Reverse GradientBased Hyperparameter Optimization, ICML17
 Automated Curriculum Learning for Neural Networks, ICML17
 How to Escape Saddle Points Efficiently, ICML17
 Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs, ICML17
 An overview of gradient optimization algorithms, (https://arxiv.org/abs/1609.04747)
 Learning Deep Architectures via Generalized Whitened Neural Networks, Ping Luo ; PMLR 70:22382246
 The Loss Surface of Deep and Wide Neural Networks, Quynh Nguyen, Matthias Hein ; PMLR 70:26032612
 Relative Fisher Information and Natural Gradient for Learning Large Modular Models, Ke Sun, Frank Nielsen ; PMLR 70:32893298
 meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting, Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang ; PMLR 70:32993308
 Axiomatic Attribution for Deep Networks, Mukund Sundararajan, Ankur Taly, Qiqi Yan ; PMLR 70:33193328
 Follow the Moving Leader in Deep Learning, Shuai Zheng, James T. Kwok ; PMLR 70:41104119
 Oracle Complexity of SecondOrder Methods for FiniteSum Problems, ICML17
 The Shattered Gradients Problem: If resnets are the answer, then what is the question?, ICML17
 Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks, ICML17
 EndtoEnd Differentiable Adversarial Imitation Learning, ICML17
 Neural Optimizer Search with Reinforcement Learning, ICML17
 Adaptive Neural Networks for Efficient Inference, ICML17
 Practical GaussNewton Optimisation for Deep Learning, ICML17
 Deep Tensor Convolution on Multicores, ICML17
 The Generalized Reparameterization Gradient, Francisco Ruiz*, Columbia University; Michalis K. Titsias, ; David Blei, NIPS16
 Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16
 MemoryEfficient Backpropagation Through Time, NIPS16
 Professor Forcing: A New Algorithm for Training Recurrent Networks, NIPS16
 Understanding the Effective Receptive Field in Deep Convolutional Neural Networks, NIPS16
Generative
 GAN tutorial by Ian Goodfellow (NIPS 2016): https://arxiv.org/abs/1701.00160 https://www.youtube.com/watch?v=AJVyzd0rqdc
 Goodfellow  Generative Models I  DLSS 2017 https://drive.google.com/file/d/0ByUKRdiCDK7bTgxTGoxYjQ4NW8/view
 Courville  Generative Models II  DLSS 2017. https://drive.google.com/file/d/0B_wzP_JlVFcKQ21udGpTSkh0aVk/view
 Makhzani and Frey  PixelGAN Autoencoders.pdf https://drive.google.com/file/d/0B6NHiPcsmak1SFdRN2lmS3FnekE/view
 Welling  Graphical Models and Deep Learning.pd https://drive.google.com/file/d/0B6NHiPcsmak1NHJHdzEySzNNQ0U/view
 Parallel Multiscale Autoregressive Density Estimation, Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Ziyu Wang, Dan Belov, Nando de Freitas
 CountBased Exploration with Neural Density Models, Georg Ostrovski, Marc Bellemare, Aaron van den Oord, Remi Munos
 Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo, Maithra Raghu, Ben Poole, Surya Ganguli, Jon Kleinberg, Jascha SohlDickstein
 Johnson  Graphical Models and Deep Learning https://drive.google.com/file/d/0B6NHiPcsmak1RmZ3bmtFWUd5bjA/view?usp=drive_web
 Variational Boosting: Iteratively Refining Posterior Approximations, Andrew Miller, Nicholas J Foti, Ryan Adams
 Stochastic Generative Hashing, Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song, ICML17
 Robust Structured Estimation with SingleIndex Models, ICML17
 Learning to Act by Predicting the Future, ICLR17
 Improving Generative Adversarial Networks with Denoising Feature Matching, ICLR17
 Boosted Generative Models, ICLR17
 The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, ICLR17
 Robust Probabilistic Modeling with Bayesian Data Reweighting, ICML17
 Deep Generative Models for Relational Data with Side Information, ICML17
 Learning to Discover CrossDomain Relations with Generative Adversarial Networks Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim ; PMLR 70:18571865
 Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks, Lars Mescheder, Sebastian Nowozin, Andreas Geiger ; PMLR 70:23912400
 McGan: Mean and Covariance Feature Matching GAN, Youssef Mroueh, Tom Sercu, Vaibhava Goel ; PMLR 70:25272535
 Parallel Multiscale Autoregressive Density Estimation, Scott Reed, Aäron Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Yutian Chen, Dan Belov, Nando Freitas ; PMLR 70:29122921
 Adversarial Feature Matching for Text Generation, Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, Lawrence Carin ; PMLR 70:40064015
 Learning Hierarchical Features from Deep Generative Models, Shengjia Zhao, Jiaming Song, Stefano Ermon ; PMLR 70:40914099
 Wasserstein Generative Adversarial Networks, ICML17
 Generalization and Equilibrium in Generative Adversarial Nets (GANs), ICML17
 Exponential Family Embeddings, NIPS16
 Wasserstein GAN, ICML17
Reinforcement
 Hasselt  Deep Reinforcement Learning  RLSS 2017.pdf https://drive.google.com/file/d/0BzUSSMdMszk6UE5TbWdZekFXSE0/view?usp=drive_web
 Pineau  RL Basic Concepts  RLSS 2017.pdf https://drive.google.com/file/d/0BzUSSMdMszk6bjl3eU5CVmU0cWs/view http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/ and http://videolectures.net/deeplearning2016_pineau_advanced_topics/
 Roux  RL in the Industry  RLSS 2017.pdf https://drive.google.com/file/d/0BzUSSMdMszk6bEprTUpCaHRrQ28/view
 Singh  Steps Towards Continual Learning.pdf https://drive.google.com/file/d/0BzUSSMdMszk6YVhFUUNLZnZLSWs/view?usp=drive_web
 Sutton  TemporalDifference Learning RLSS 2017.pd https://drive.google.com/file/d/0BzUSSMdMszk6VE9kMkY2SzQzSW8/view?usp=drive_web
 Szepesvari  Theory of RL  RLSS 2017.pdf https://drive.google.com/file/d/0BzUSSMdMszk6U194Ym5jSnZQbGM/view?usp=drive_web
 Thomas  Safe Reinforcement Learning  RLSS 2017.pdf https://drive.google.com/file/d/0BzUSSMdMszk6TDRMRGRaM0dBcHM/view?usp=drive_web
 Minimax Regret Bounds for Reinforcement Learning, Mohammad Gheshlaghi Azar, Ian Osband, Remi Munos
 Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy
 DARLA: Improving ZeroShot Transfer in Reinforcement Learning, Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Chris Burgess, Alexander Pritzel, Matt Botvinick, Charles Blundell, Alexander Lerchner
 A Distributional Perspective on Reinforcement Learning, Marc G. Bellemare, Will Dabney, Remi Munos
 A Laplacian Framework for Option Discovery in Reinforcement Learning, Marlos Machado (Univ. Alberta), Marc G. Bellemare, Michael Bowling
 The Predictron: EndtoEnd Learning and Planning, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel DulacArnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
 FeUdal Networks for Hierarchical Reinforcement Learning, Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Hees, Max Jaderberg, David Silver, Koray Kavukcuoglu
 Neural Episodic Control, Alex Pritzel, Benigno Uria, Sriram Srinivasan, Adria Puigdomenech, Oriol Vinyals, Demis Hassabis, Daan Wierstra, Charles Blundell
 Robust Adversarial Reinforcement Learning, Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta
 Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, Michael Gygli, Mohammad Norouzi, Anelia Angelova
 Distral: Robust Multitask Reinforcement Learning, https://arxiv.org/pdf/1707.04175.pdf
 Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17
 QProp: SampleEfficient Policy Gradient with An OffPolicy Critic, ICLR17
 DARLA: Improving ZeroShot Transfer in Reinforcement Learning, ICML17
 ZeroShot Task Generalization with MultiTask Deep Reinforcement Learning, Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli ; PMLR 70:26612670
 CountBased Exploration with Neural Density Models, Georg Ostrovski, Marc G. Bellemare, Aäron Oord, Rémi Munos ; PMLR 70:27212730
 Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell ; PMLR 70:33093318
More:
 ICLR 2017 Papers
 ICML 2017 Papers
 NIPS 2017 papers
 Yann Lecun
 Y. Bengio
 G. Hinton
 Juergen Schmidhuber