Deep Learning Readings We Covered in 2018 Team Meetings


2018Reads (Index of Posts):

No. Read Date Title and Information We Read @
1 2018, Jan, 10 Application18- Property of DeepNN Models and Discrete tasks 2018-team
2 2018, Feb, 20 Survey18- My Survey Talk at UVA HMI Seminar - A quick and rough overview of DNN 2018-me
3 2018, Apr, 20 Generative18 -Generative Adversarial Network (classified) 2018-team
4 2018, May, 3 Structures18- More Attentions 2018-team
5 2018, May, 11 Structures18- DNN for Multiple Label Classification 2018-team
6 2018, May, 12 Reliable18- Adversarial Attacks and DNN 2018-team
7 2018, May, 20 Reliable18- Adversarial Attacks and DNN and More 2018-team
8 2018, Aug, 3 Reliable18- Testing and Verifying DNNs 2018-team
9 2018, Aug, 13 Application18- DNNs in a Few BioMedical Tasks 2018-team
10 2018, Aug, 23 Generative18 -A few more deep discrete Generative Models 2018-team
11 2018, Aug, 27 Application18- A few DNN for Question Answering 2018-team
12 2018, Aug, 29 Survey18- My Tutorial Talk at ACM BCB18 - Interpretable Deep Learning for Genomics 2018-me
13 2018, Oct, 11 Structures18- DNN for Relations 2018-team
14 2018, Oct, 12 Reliable18- Understand DNNs 2018-team
15 2018, Oct, 13 Application18- DNNs in a Few Bio CRISPR Tasks 2018-team
16 2018, Oct, 16 Application18- Graph DNN in a Few Bio Tasks 2018-team
17 2018, Oct, 25 Structure18- DNNs Varying Structures 2018-team
18 2018, Nov, 20 Reliable18- Adversarial Attacks and DNN 2018-team
19 2018, Dec, 2 Reliable18- Adversarial Attacks and DNN 2018-team
20 2018, Dec, 20 Application18- DNN for MedQA 2018-team
21 2018, Dec, 21 Generate18- Deep Generative Models for Graphs 2018-team
22 2018, Dec, 29 Generate18- Deep Generative Models for discrete 2018-team

Application18- Property of DeepNN Models and Discrete tasks

3Reliable embedding generative NLP generalization NLP
Presenter Papers Paper URL Our Slides
Bill Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 1 PDF PDF
Bill Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio PDF PDF
Bill Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang 2 PDF PDF
Bill On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick PDF PDF
  1. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation/ Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT’s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google’s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (“wordpieces”) for both input and output. This method provides a good balance between the flexibility of “character”-delimited models and the efficiency of “word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT’14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system.  

  2. Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang / We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.  

Survey18- My Survey Talk at UVA HMI Seminar - A quick and rough overview of DNN

0Basics
Presenter Papers Paper URL Our Slides
Dr. Qi A quick and rough survey of Deep-Neural-Networks   PDF

Generative18 -Generative Adversarial Network (classified)

5Generative DNA generative GAN generalization
Presenter Papers Paper URL Our Slides
BrandonLiu Summary of Recent Generative Adversarial Networks (Classified)   PDF
Jack Generating and designing DNA with deep generative models, Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey PDF PDF
GaoJi More about basics of GAN   PDF
  McGan: Mean and Covariance Feature Matching GAN, PMLR 70:2527-2535 PDF  
  Wasserstein GAN, ICML17 PDF  
  Geometrical Insights for Implicit Generative Modeling, L Bottou, M Arjovsky, D Lopez-Paz, M Oquab PDF  

Structures18- More Attentions

2Architecture attention relational Variational
Presenter Papers Paper URL Our Slides
Arshdeep Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 1 PDF PDF
Arshdeep Latent Alignment and Variational Attention 2 PDF PDF
Arshdeep Modularity Matters: Learning Invariant Relational Reasoning Tasks, Jason Jo, Vikas Verma, Yoshua Bengio 3 PDF PDF
  1. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention / Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio/ Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.  

  2. Latent Alignment and Variational Attention / NIPS2018 / Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush/ Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.  

  3. Modularity Matters: Learning Invariant Relational Reasoning Tasks, Jason Jo, Vikas Verma, Yoshua Bengio / ICML18 / We focus on two supervised visual reasoning tasks whose labels encode a semantic relational rule between two or more objects in an image: the MNIST Parity task and the colorized Pentomino task. The objects in the images undergo random translation, scaling, rotation and coloring transformations. Thus these tasks involve invariant relational reasoning. We report uneven performance of various deep CNN models on these two tasks. For the MNIST Parity task, we report that the VGG19 model soundly outperforms a family of ResNet models. Moreover, the family of ResNet models exhibits a general sensitivity to random initialization for the MNIST Parity task. For the colorized Pentomino task, now both the VGG19 and ResNet models exhibit sluggish optimization and very poor test generalization, hovering around 30% test error. The CNN we tested all learn hierarchies of fully distributed features and thus encode the distributed representation prior. We are motivated by a hypothesis from cognitive neuroscience which posits that the human visual cortex is modularized, and this allows the visual cortex to learn higher order invariances. To this end, we consider a modularized variant of the ResNet model, referred to as a Residual Mixture Network (ResMixNet) which employs a mixture-of-experts architecture to interleave distributed representations with more specialized, modular representations. We show that very shallow ResMixNets are capable of learning each of the two tasks well, attaining less than 2% and 1% test error on the MNIST Parity and the colorized Pentomino tasks respectively. Most importantly, the ResMixNet models are extremely parameter efficient: generalizing better than various non-modular CNNs that have over 10x the number of parameters. These experimental results support the hypothesis that modularity is a robust prior for learning invariant relational reasoning.  

Structures18- DNN for Multiple Label Classification

2Architecture 2Graphs multi-label structured Adversarial-loss attention RNN
Presenter Papers Paper URL Our Slides
Chao Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification PDF PDF
Jack FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning PDF PDF
BasicMLC Multi-Label Classification: An Overview PDF  
SPEN Structured Prediction Energy Networks PDF  
InfNet Learning Approximate Inference Networks for Structured Prediction PDF  
SPENMLC Deep Value Networks PDF  
Adversarial Semantic Segmentation using Adversarial Networks PDF  
EmbedMLC StarSpace: Embed All The Things! PDF  
deepMLC CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016 PDF  
deepMLC Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018 PDF  

Reliable18- Adversarial Attacks and DNN

3Reliable 9DiscreteApp Adversarial-Examples generative Interpretable
Presenter Papers Paper URL Our Slides
Bill Intriguing Properties of Adversarial Examples, Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le 1 PDF PDF
Bill Adversarial Spheres 2 PDF PDF
Bill Adversarial Transformation Networks: Learning to Generate Adversarial Examples, Shumeet Baluja, Ian Fischer 3 PDF PDF
Bill Thermometer encoding: one hot way to resist adversarial examples 4 PDF PDF
  Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow 5 PDF  
  1. Intriguing Properties of Adversarial Examples, Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le / It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we argue that the origin of adversarial examples is primarily due to an inherent uncertainty that neural networks have about their predictions. We show that the functional form of this uncertainty is independent of architecture, dataset, and training protocol; and depends only on the statistics of the logit differences of the network, which do not change significantly during training. This leads to adversarial error having a universal scaling, as a power-law, with respect to the size of the adversarial perturbation. We show that this universality holds for a broad range of datasets (MNIST, CIFAR10, ImageNet, and random data), models (including state-of-the-art deep networks, linear models, adversarially trained networks, and networks trained on randomly shuffled labels), and attacks (FGSM, step l.l., PGD). Motivated by these results, we study the effects of reducing prediction entropy on adversarial robustness. Finally, we study the effect of network architectures on adversarial sensitivity. To do this, we use neural architecture search with reinforcement learning to find adversarially robust architectures on CIFAR10. Our resulting architecture is more robust to white \emph{and} black box attacks compared to previous attempts  

  2. Adversarial Spheres / Ian Goodfellow/ State of the art computer vision models have been shown to be vulnerable to small adversarial perturbations of the input. In other words, most images in the data distribution are both correctly classified by the model and are very close to a visually similar misclassified image. Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold. As a first step towards exploring this hypothesis, we study a simple synthetic dataset of classifying between two concentric high dimensional spheres. For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/d‾‾√). Surprisingly, when we train several different architectures on this dataset, all of their error sets naturally approach this theoretical bound. As a result of the theory, the vulnerability of neural networks to small adversarial perturbations is a logical consequence of the amount of test error observed. We hope that our theoretical analysis of this very simple case will point the way forward to explore how the geometry of complex real-world data sets leads to adversarial examples.  

  3. Adversarial Transformation Networks: Learning to Generate Adversarial Examples, Shumeet Baluja, Ian Fischer/ With the rapidly increasing popularity of deep neural networks for image recognition tasks, a parallel interest in generating adversarial examples to attack the trained models has arisen. To date, these approaches have involved either directly computing gradients with respect to the image pixels or directly solving an optimization on the image pixels. We generalize this pursuit in a novel direction: can a separate network be trained to efficiently attack another fully trained network? We demonstrate that it is possible, and that the generated attacks yield startling insights into the weaknesses of the target network. We call such a network an Adversarial Transformation Network (ATN). ATNs transform any input into an adversarial attack on the target network, while being minimally perturbing to the original inputs and the target network’s outputs. Further, we show that ATNs are capable of not only causing the target network to make an error, but can be constructed to explicitly control the type of misclassification made. We demonstrate ATNs on both simple MNIST digit classifiers and state-of-the-art ImageNet classifiers deployed by Google, Inc.: Inception ResNet-v2. 

  4. Thermometer encoding: one hot way to resist adversarial examples / It is well known that for neural networks, it is possible to construct inputs which are misclassified by the network yet indistinguishable from true data points, known as adversarial examples. We propose a simple modification to standard neural network architectures, \emph{thermometer encoding}, which significantly increases the robustness of the network to adversarial examples. We demonstrate this robustness with experiments on the MNIST, CIFAR-10, CIFAR-100, and SVHN datasets, and show that models with thermometer-encoded inputs consistently have higher accuracy on adversarial examples, while also maintaining the same accuracy on non-adversarial examples and training more quickly.  

  5. Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow (Submitted on 16 Mar 2018)/ In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets. Finally, we show that adversarial logit pairing achieves the state of the art defense on ImageNet against PGD white box attacks, with an accuracy improvement from 1.5% to 27.9%. Adversarial logit pairing also successfully damages the current state of the art defense against black box attacks on ImageNet (Tramer et al., 2018), dropping its accuracy from 66.6% to 47.1%. With this new accuracy drop, adversarial logit pairing ties with Tramer et al.(2018) for the state of the art on black box attacks on ImageNet.  

Reliable18- Adversarial Attacks and DNN and More

3Reliable 9DiscreteApp seq2seq Adversarial-Examples Certified-Defense
Presenter Papers Paper URL Our Slides
Bill Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples PDF PDF
Bill Adversarial Examples for Evaluating Reading Comprehension Systems, Robin Jia, Percy Liang PDF PDF
Bill Certified Defenses against Adversarial Examples, Aditi Raghunathan, Jacob Steinhardt, Percy Liang PDF PDF
Bill Provably Minimally-Distorted Adversarial Examples, Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill PDF PDF

Reliable18- Testing and Verifying DNNs

3Reliable 6Reinforcement RL Fuzzing Adversarial-Examples verification software-testing black-box white-box
Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  
  1. Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors / ICLR19/ We study the problem of generating adversarial examples in a black-box setting in which only loss-oracle access to a model is available. We introduce a framework that conceptually unifies much of the existing work on black-box attacks, and demonstrate that the current state-of-the-art methods are optimal in a natural sense. Despite this optimality, we show how to improve black-box attacks by bringing a new element into the problem: gradient priors. We give a bandit optimization-based algorithm that allows us to seamlessly integrate any such priors, and we explicitly identify and incorporate two examples. The resulting methods use two to four times fewer queries and fail two to five times less than the current state-of-the-art. The code for reproducing our work is available at https://git.io/fAjOJ.  

Application18- DNNs in a Few BioMedical Tasks

3Reliable 6Reinforcement 9DiscreteApp brain RNA DNA Genomics generative
Presenter Papers Paper URL Our Slides
Arshdeep DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. PDF PDF
Arshdeep Solving the RNA design problem with reinforcement learning, PLOSCB 1 PDF PDF
Arshdeep Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk 2 PDF PDF
Arshdeep Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio 3 PDF PDF
Brandon Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics PDF PDF
Arshdeep Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions 2 PDF PDF
  1. Solving the RNA design problem with reinforcement learning, PLOSCB/ We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance. Author summary: Designing RNA sequences that fold to desired structures is an important problem in bioengineering. We have applied recent advances in machine learning to address this problem. The computer learns without any human input, using only trial and error to figure out how to design RNA. It quickly discovers powerful strategies that let it solve many difficult design problems. When tested on a challenging benchmark, it outperforms all previous algorithms. We analyze its solutions and identify some of the strategies it has learned, as well as other important strategies it has failed to learn. This suggests possible approaches to further improving its performance. This work reflects a paradigm shift taking place in computer science, which has the potential to transform computational biology. Instead of relying on experts to design algorithms by hand, computers can use artificial intelligence to learn their own algorithms directly. The resulting methods often work better than the ones designed by humans.  

  2. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk / Nature Geneticsvolume 50, pages1171–1179 (2018)/ Key challenges for human genetics, precision medicine and evolutionary biology include deciphering the regulatory code of gene expression and understanding the transcriptional effects of genome variation. However, this is extremely difficult because of the enormous scale of the noncoding mutation space. We developed a deep learning–based framework, ExPecto, that can accurately predict, ab initio from a DNA sequence, the tissue-specific transcriptional effects of mutations, including those that are rare or that have not been observed. We prioritized causal variants within disease- or trait-associated loci from all publicly available genome-wide association studies and experimentally validated predictions for four immune-related diseases. By exploiting the scalability of ExPecto, we characterized the regulatory mutation space for human RNA polymerase II–transcribed genes by in silico saturation mutagenesis and profiled > 140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effects, making ExPecto an end-to-end computational framework for the in silico prediction of expression and disease risk.   2

  3. Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio/ We study the challenges of applying deep learning to gene expression data. We find experimentally that there exists non-linear signal in the data, however is it not discovered automatically given the noise and low numbers of samples used in most research. We discuss how gene interaction graphs (same pathway, protein-protein, co-expression, or research paper text association) can be used to impose a bias on a deep model similar to the spatial bias imposed by convolutions on an image. We explore the usage of Graph Convolutional Neural Networks coupled with dropout and gene embeddings to utilize the graph information. We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used. We conclude that more work should be done in this direction. We design experiments that show why existing methods fail to capture signal that is present in the data when features are added which clearly isolates the problem that needs to be addressed.  

Generative18 -A few more deep discrete Generative Models

5Generative 9DiscreteApp generative generalization GAN discrete Amortized Autoencoder Variational program
Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF
  1. Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals / ICML18/ Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator’s output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, Omniglot, CelebA) and synthetic 3D datasets.  

  2. Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions / Anvita Gupta, James Zou (arxiv Submitted on 5 Apr 2018) / Generative Adversarial Networks (GANs) represent an attractive and novel approach to generate realistic data, such as genes, proteins, or drugs, in synthetic biology. Here, we apply GANs to generate synthetic DNA sequences encoding for proteins of variable length. We propose a novel feedback-loop architecture, called Feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyzer. The proposed architecture also has the advantage that the analyzer need not be differentiable. We apply the feedback-loop mechanism to two examples: 1) generating synthetic genes coding for antimicrobial peptides, and 2) optimizing synthetic genes for the secondary structure of their resulting peptides. A suite of metrics demonstrate that the GAN generated proteins have desirable biophysical properties. The FBGAN architecture can also be used to optimize GAN-generated datapoints for useful properties in domains beyond genomics.  

  3. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh (2016)/ The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack useful reparameterizations due to the discontinuous nature of discrete states. In this work we introduce Concrete random variables—continuous relaxations of discrete random variables. The Concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, Concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-probability of latent stochastic nodes) on the corresponding discrete graph. We demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks.  

  4. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts , arxiv 2017/ Deep generative neural networks have proven effective at both conditional and unconditional modeling of complex data distributions. Conditional generation enables interactive control, but creating new controls often requires expensive retraining. In this paper, we develop a method to condition generation without retraining the model. By post-hoc learning latent constraints, value functions that identify regions in latent space that generate outputs with desired attributes, we can conditionally sample from these regions with gradient-based optimization or amortized actor functions. Combining attribute constraints with a universal “realism” constraint, which enforces similarity to the data distribution, we generate realistic conditional images from an unconditional variational autoencoder. Further, using gradient-based optimization, we demonstrate identity-preserving transformations that make the minimal adjustment in latent space to modify the attributes of an image. Finally, with discrete sequences of musical notes, we demonstrate zero-shot conditional generation, learning latent constraints in the absence of labeled data or a differentiable reward function. Code with dedicated cloud instance has been made publicly available.  

Application18- A few DNN for Question Answering

2Architecture 8Scalable 9DiscreteApp trees metric-learning embedding QA
Presenter Papers Paper URL Our Slides
Derrick GloVe: Global Vectors for Word Representation PDF PDF
Derrick PARL.AI: A unified platform for sharing, training and evaluating dialog models across many tasks. URL PDF
Derrick scalable nearest neighbor algorithms for high dimensional data (PAMI14) 1 PDF PDF
Derrick StarSpace: Embed All The Things! PDF PDF
Derrick Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading, Martin Raison, Pierre-Emmanuel Mazaré, Rajarshi Das, Antoine Bordes PDF PDF
  1. Salable nearest neighbor algorithms for high dimensional data (PAMI14) / https://www.ncbi.nlm.nih.gov/pubmed/26353063 / For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.  

Survey18- My Tutorial Talk at ACM BCB18 - Interpretable Deep Learning for Genomics

9DiscreteApp
Presenter Papers Paper URL Our Slides
Dr. Qi Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation   PDF

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin, NIPS2017 / Ritambhara Singh, Jack Lanchantin, Arshdeep Sekhon, Yanjun Qi

The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together? Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; we call AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map. Code and data are shared atwww.deepchrome.org

Structures18- DNN for Relations

2Architecture 2Graphs relational InfoMax
Presenter Papers Paper URL Our Slides
Arshdeep Relational inductive biases, deep learning, and graph networks PDF PDF
Arshdeep Discriminative Embeddings of Latent Variable Models for Structured Data PDF PDF
Jack Deep Graph Infomax PDF PDF

Reliable18- Understand DNNs

3Reliable visualizing interpretable Attribution GAN understanding
Presenter Papers Paper URL Our Slides
Jack A Unified Approach to Interpreting Model Predictions PDF PDF
Jack “Why Should I Trust You?”: Explaining the Predictions of Any Classifier PDF PDF
Jack Visual Feature Attribution using Wasserstein GANs PDF PDF
Jack GAN Dissection: Visualizing and Understanding Generative Adversarial Networks PDF PDF
GaoJi Recent Interpretable machine learning papers PDF PDF
Jennifer The Building Blocks of Interpretability PDF PDF

Application18- DNNs in a Few Bio CRISPR Tasks

9DiscreteApp brain CRISPR DNA Genomics generative protein
Presenter Papers Paper URL Our Slides
Arshdeep deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 PDF PDF
Arshdeep The CRISPR tool kit for genome editing and beyond, Mazhar Adli PDF PDF
Eric Intro of Genetic Engineering PDF PDF
Eric Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs PDF PDF
Brandon Generative Modeling for Protein Structure URL PDF

Application18- Graph DNN in a Few Bio Tasks

2Graphs 9DiscreteApp graph protein molecule
Presenter Papers Paper URL Our Slides
Eric Modeling polypharmacy side effects with graph convolutional networks PDF PDF
Eric Protein Interface Prediction using Graph Convolutional Networks PDF PDF
Eric Structure biology meets data science: does anything change URL PDF
Eric DeepSite: protein-binding site predictor using 3D-convolutional neural networks URL PDF

Structure18- DNNs Varying Structures

2Architecture 8Scalable 7MetaDomain Architecture-Search Hyperparameter dynamic
Presenter Papers Paper URL Our Slides
Arshdeep Learning Transferable Architectures for Scalable Image Recognition PDF PDF
Arshdeep FractalNet: Ultra-Deep Neural Networks without Residuals PDF PDF

Reliable18- Adversarial Attacks and DNN

3Reliable Adversarial-Examples software-testing Interpretable distillation
Presenter Papers Paper URL Our Slides
Bill Adversarial Examples that Fool both Computer Vision and Time-Limited Humans PDF PDF
Bill Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Bill TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing PDF PDF
Bill Distilling the Knowledge in a Neural Network PDF PDF
Bill Defensive Distillation is Not Robust to Adversarial Examples PDF PDF
Bill Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow PDF PDF

Reliable18- Adversarial Attacks and DNN

3Reliable Adversarial-Examples visualizing Interpretable EHR NLP
Presenter Papers Paper URL Our Slides
Jennifer Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Jennifer Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning PDF PDF
Jennifer Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers PDF PDF
Jennifer CleverHans PDF PDF
Ji Ji-f18-New papers about adversarial attack   PDF

Application18- DNN for MedQA

2Architecture 9DiscreteApp seq2seq recommendation QA graph relational EHR
Presenter Papers Paper URL Our Slides
Bill Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) PDF PDF
Derrick Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) PDF PDF
Chao Reading Wikipedia to Answer Open-Domain Questions PDF PDF
Jennifer Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text PDF PDF

Generate18- Deep Generative Models for Graphs

5Generative 2Graphs generative GAN discrete Autoencoder Variational molecule graph DNA
Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Generate18- Deep Generative Models for discrete

5Generative 2Graphs generative GAN discrete Autoencoder Variational
Presenter Papers Paper URL Our Slides
Tkach Boundary-Seeking Generative Adversarial Networks PDF PDF
Tkach Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF PDF
Tkach Generating Sentences from a Continuous Space PDF PDF
BackTop