Besides using high-level categories, we also use the following detailed tags to label each read post we finished. Click on a tag to see relevant list of readings.

adversarial-examples adversarial-loss agent agi alignment alphago amortized analysis ape architecture-search associative attention attribution autoencoder autoregressive auxiliary backprop basicllm beam bert bias bias-variance binarization binary black-box blocking brain casual certified-defense chromatin cnn composition compression concept crispr cryptography curriculum data-valuation denoising dialog difference-analysis differentiation diffusion dimension-reduction discrete distillation distributed dna domain-adaptation domainadapt dynamic efficiency ehr em embedding encoder-decoder expressive few-shot forcing forgetting fuzzing gan gcn gene-network generalization generative genomics geometric graph graph-attention graphical-model hallucination hash heterogeneous hierarchical high-dimensional human-alignment hyperparameter image-synthesis imitation-learning imputation influence-functions infomax informax interpretable interpretibility invariant knowledge-graph language-model language-processing learn2learn llmevaluate loss low-rank manifold markov matching matching-net matrix-completion memorization memory meta-learning metamorphic metric-learning mimic mitigate mobile model-as-sample model-criticism modeledit molecule multi-label multi-task mutual-information neural-programming neuroscience nlp noise nonparametric normalization ntm optimization parallel parsimonious planning pointer privacy program propagation protein pruning qa quantization rag random reasoning recommendation regularization relational rl rna rnn robustness safety sample-selection sampling scalable secure semi-supervised seq2seq set shapley sketch small-data software-testing sparsity structured stylometric submodular subspace temporal-difference text training transfer transfer-learning trees tutorial understanding vae value-networks variational verification visualizing white-box


[1]: adversarial-examples

Table of readings


Presenter Papers Paper URL Our Slides
Robust Adversarial Attacks on Graph Structured Data Pdf Faizan [PDF + GaoJi Pdf
Robust KDD’18 Adversarial Attacks on Neural Networks for Graph Data Pdf Faizan PDF + GaoJi Pdf
Robust Attacking Binarized Neural Networks Pdf Faizan PDF

Presenter Papers Paper URL Our Slides
Jennifer Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Jennifer Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning PDF PDF
Jennifer Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers PDF PDF
Jennifer CleverHans PDF PDF
Ji Ji-f18-New papers about adversarial attack   PDF

Presenter Papers Paper URL Our Slides
Bill Adversarial Examples that Fool both Computer Vision and Time-Limited Humans PDF PDF
Bill Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Bill TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing PDF PDF
Bill Distilling the Knowledge in a Neural Network PDF PDF
Bill Defensive Distillation is Not Robust to Adversarial Examples PDF PDF
Bill Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow PDF PDF

Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  

Presenter Papers Paper URL Our Slides
Bill Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples PDF PDF
Bill Adversarial Examples for Evaluating Reading Comprehension Systems, Robin Jia, Percy Liang PDF PDF
Bill Certified Defenses against Adversarial Examples, Aditi Raghunathan, Jacob Steinhardt, Percy Liang PDF PDF
Bill Provably Minimally-Distorted Adversarial Examples, Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill PDF PDF

Presenter Papers Paper URL Our Slides
Bill Intriguing Properties of Adversarial Examples, Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le 1 PDF PDF
Bill Adversarial Spheres 2 PDF PDF
Bill Adversarial Transformation Networks: Learning to Generate Adversarial Examples, Shumeet Baluja, Ian Fischer 3 PDF PDF
Bill Thermometer encoding: one hot way to resist adversarial examples 4 PDF PDF
  Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow 5 PDF  

Presenter Papers Paper URL Our Slides
Tianlu Robustness of classifiers: from adversarial to random noise, NIPS16 PDF 1 PDF
Anant Blind Attacks on Machine Learners, 2 NIPS16 PDF PDF
  Data Noising as Smoothing in Neural Network Language Models (Ng), ICLR17 3 pdf  
  The Robustness of Estimator Composition, NIPS16 4 PDF  

Presenter Papers Paper URL Our Slides
GaoJi Delving into Transferable Adversarial Examples and Black-box Attacks,ICLR17 1 pdf PDF
Shijia On Detecting Adversarial Perturbations, ICLR17 2 pdf PDF
Anant Parseval Networks: Improving Robustness to Adversarial Examples, ICML17 3 pdf PDF
Bargav Being Robust (in High Dimensions) Can Be Practical, ICML17 4 pdf PDF

Presenter Papers Paper URL Our Slides
AE Intriguing properties of neural networks / PDF  
AE Explaining and Harnessing Adversarial Examples PDF  
AE Towards Deep Learning Models Resistant to Adversarial Attacks PDF  
AE DeepFool: a simple and accurate method to fool deep neural networks PDF  
AE Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner PDF PDF
Data Basic Survey of ImageNet - LSVRC competition URL PDF
Understand Understanding Black-box Predictions via Influence Functions PDF  
Understand Deep inside convolutional networks: Visualising image classification models and saliency maps PDF  
Understand BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1] PDF  
provable Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter, URL  

Table of readings


Index Papers Our Slides
1 BIAS ALSO MATTERS: BIAS ATTRIBUTION FOR DEEP NEURAL NETWORK EXPLANATION Arsh Survey
2 Data Shapley: Equitable Valuation of Data for Machine Learning Arsh Survey
  What is your data worth? Equitable Valuation of Data Sanchit Survey
3 Neural Network Attributions: A Causal Perspective Zhe Survey
4 Defending Against Neural Fake News Eli Survey
5 Interpretation of Neural Networks is Fragile Eli Survey
  Interpretation of Neural Networks is Fragile Pan Survey
6 Parsimonious Black-Box Adversarial Attacks Via Efficient Combinatorial Optimization Eli Survey
7 Retrofitting Word Vectors to Semantic Lexicons Morris Survey
8 On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Morris Survey
9 Towards Deep Learning Models Resistant to Adversarial Attacks Pan Survey
10 Robust Attribution Regularization Pan Survey
11 Sanity Checks for Saliency Maps Sanchit Survey
12 Survey of data generation and evaluation in Interpreting DNN pipelines Sanchit Survey
13 Think Architecture First: Benchmarking Deep Learning Interpretability in Time Series Predictions Sanchit Survey
14 Universal Adversarial Triggers for Attacking and Analyzing NLP Sanchit Survey
15 Apricot: Submodular selection for data summarization in Python Arsh Survey

Team INDEX Title & Link Tags Our Slide
T3 Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints submodular, coreset, safety OurSlide
T6 Decision Boundary Analysis of Adversarial Examples adversarial-examples OurSlide
T8 Robustness may be at odds with accuracy robustness OurSlide
T18 Towards Reverse-Engineering Black-Box Neural Networks meta, model-as-sample, safety, privacy OurSlide
T23 The Odds are Odd: A Statistical Test for Detecting Adversarial Examples adversarial-examples OurSlide
T25 Learning how to explain neural networks: PatternNet and PatternAttribution Attribution, Interpretable OurSlide
T31 Detecting Statistical Interactions from Neural Network Weights Interpretable, Relational OurSlide


[2]: adversarial-loss

Table of readings


Presenter Papers Paper URL Our Slides
Chao Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification PDF PDF
Jack FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning PDF PDF
BasicMLC Multi-Label Classification: An Overview PDF  
SPEN Structured Prediction Energy Networks PDF  
InfNet Learning Approximate Inference Networks for Structured Prediction PDF  
SPENMLC Deep Value Networks PDF  
Adversarial Semantic Segmentation using Adversarial Networks PDF  
EmbedMLC StarSpace: Embed All The Things! PDF  
deepMLC CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016 PDF  
deepMLC Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018 PDF  


[3]: agent

Table of readings


In this session, our readings cover:

Required Readings:

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

  • Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang
  • Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents’ capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.

More Readings:

Understanding the planning of LLM agents: A survey

  • https://arxiv.org/abs/2402.02716
  • As Large Language Models (LLMs) have shown significant intelligence, the progress to leverage LLMs as planning modules of autonomous agents has attracted more attention. This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. We provide a taxonomy of existing works on LLM-Agent planning, which can be categorized into Task Decomposition, Plan Selection, External Module, Reflection and Memory. Comprehensive analyses are conducted for each direction, and further challenges for the field of research are discussed.

LLM Agents can Autonomously Hack Websites

  • Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang
  • In recent years, large language models (LLMs) have become increasingly capable and can now interact with tools (i.e., call functions), read documents, and recursively call themselves. As a result, these LLMs can now function autonomously as agents. With the rise in capabilities of these agents, recent work has speculated on how LLM agents would affect cybersecurity. However, not much is known about the offensive capabilities of LLM agents. In this work, we show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. Importantly, the agent does not need to know the vulnerability beforehand. This capability is uniquely enabled by frontier models that are highly capable of tool use and leveraging extended context. Namely, we show that GPT-4 is capable of such hacks, but existing open-source models are not. Finally, we show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild. Our findings raise questions about the widespread deployment of LLMs.

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

  • Zehui Chen, Kuikun Liu, Qiuchen Wang, Wenwei Zhang, Jiangning Liu, Dahua Lin, Kai Chen, Feng Zhao
  • Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. How to integrate agent ability into general LLMs becomes a crucial and urgent problem. This paper first delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. Based on the above findings, we propose Agent-FLAN to effectively Fine-tune LANguage models for Agents. Through careful decomposition and redesign of the training corpus, Agent-FLAN enables Llama2-7B to outperform prior best works by 3.5\% across various agent evaluation datasets. With comprehensively constructed negative samples, Agent-FLAN greatly alleviates the hallucination issues based on our established evaluation benchmark. Besides, it consistently improves the agent capability of LLMs when scaling model sizes while slightly enhancing the general capability of LLMs. The code will be available at this https URL.

Humanoid Locomotion as Next Token Prediction

  • Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik
  • We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This general formulation enables us to leverage data with missing modalities, like video trajectories without actions. We train our model on a collection of simulated trajectories coming from prior neural network policies, model-based controllers, motion capture data, and YouTube videos of humans. We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot. Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize to commands not seen during training like walking backward. These findings suggest a promising path toward learning challenging real-world control tasks by generative modeling of sensorimotor trajectories.

Required Readings:

A Survey on Large Language Model based Autonomous Agents

  • https://arxiv.org/abs/2308.11432
  • Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective. More specifically, we first discuss the construction of LLM-based autonomous agents, for which we propose a unified framework that encompasses a majority of the previous work. Then, we present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository of relevant references at this https URL.

More Readings:

Position Paper: Agent AI Towards a Holistic Intelligence

  • https://arxiv.org/abs/2403.00833
  • Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao
  • Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize developing Agent AI – an embodied system that integrates large foundation models into agent actions. The emerging field of Agent AI spans a wide range of existing embodied and agent-based multimodal interactions, including robotics, gaming, and healthcare systems, etc. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model. On top of this idea, we discuss how agent AI exhibits remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Furthermore, we discuss the potential of Agent AI from an interdisciplinary perspective, underscoring AI cognition and consciousness within scientific discourse. We believe that those discussions serve as a basis for future research directions and encourage broader societal engagement.

Tool Use in LLMs

  • https://zorazrw.github.io/files/WhatAreToolsAnyway.pdf
  • an overview of tool use in LLMs, including a formal definition of the tool-use paradigm, scenarios where LLMs leverage tool usage, and for which tasks this approach works well; it also provides an analysis of complex tool usage and summarize testbeds and evaluation metrics across LM tooling works

Practices for Governing Agentic AI Systems

  • https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf
  • Agentic AI systems—AI systems that can pursue complex goals with limited direct supervision— are likely to be broadly useful if we can integrate them responsibly into our society. While such systems have substantial potential to help people more efficiently and effectively achieve their own goals, they also create risks of harm. In this white paper, we suggest a definition of agentic AI systems and the parties in the agentic AI system life-cycle, and highlight the importance of agreeing on a set of baseline responsibilities and safety best practices for each of these parties. As our primary contribution, we offer an initial set of practices for keeping agents’ operations safe and accountable, which we hope can serve as building blocks in the development of agreed baseline best practices. We enumerate the questions and uncertainties around operationalizing each of these practices that must be addressed before such practices can be codified. We then highlight categories of indirect impacts from the wide-scale adoption of agentic AI systems, which are likely to necessitate additional governance frameworks.

Emergent autonomous scientific research capabilities of large language models

  • https://arxiv.org/abs/2304.05332
  • Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent’s scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse.

What Makes a Dialog Agent Useful?

  • https://huggingface.co/blog/dialog-agents


[4]: agi

Table of readings


Papers Paper URL Abstract
Training language models to follow instructions with human feedback URL “further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences URL “explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”

Decision Transformer: Reinforcement Learning via Sequence Modeling

  • Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
  • https://arxiv.org/abs/2106.01345
  • We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

Prompting Decision Transformer for Few-Shot Policy Generalization

  • Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
  • https://arxiv.org/abs/2206.13499
  • Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

Papers Paper URL Abstract
A Generalist Agent URL Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.
Why should we prefer offline reinforcement learning over behavioral cloning? ICLR 2022 URL natural to ask: when can an offline RL method outperform BC with an equal amount of expert data, even when BC is a natural choice?
Uni[MASK]: Unified Inference in Sequential Decision Problems URL show how sequential decision making tasks can be thought of in terms of corresponding input maskings, enabling the training of a single model to perform all tasks at once. applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns.


[5]: alignment

Table of readings


In this session, our readings cover:

Required Readings:

Recent Large Language Models Reshaping the Open-Source Arena

  • https://deci.ai/blog/list-of-large-language-models-in-open-source/
  • The release of Meta’s Llama model and the subsequent release of Llama 2 in 2023 kickstarted an explosion of open-source language models, with better and more innovative models being released on what seems like a daily basis. With new open-source models being released on a daily basis, here we dove into the ocean of open-source possibilities to curate a select list of the most intriguing and influential models making waves in recent months, inlcuding Qwen1.5/ Yi/ Smaug/ Mixtral-8x7B-v0.1/ DBRX/ SOLAR-10.7B-v1.0 / Tulu 2 / WizardLM/ Starling 7B/ OLMo-7B/ Gemma and DeciLM-7B.
  • Plus the newly avaiable DBRX model https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

Instruction Tuning for Large Language Models: A Survey

  • https://arxiv.org/abs/2308.10792
  • Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang
  • This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users’ objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research. Project page: this http URL

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models

  • https://arxiv.org/abs/2203.06904
  • Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divide existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning.

More readings

Gemini: A Family of Highly Capable Multimodal Models

  • https://arxiv.org/abs/2312.11805
  • This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.

QLoRA: Efficient Finetuning of Quantized LLMs

  • Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.
  • https://arxiv.org/abs/2106.09685
  • An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at this https URL.

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

  • https://arxiv.org/abs/2401.00788
  • Terry Yue Zhuo, Armel Zebaze, Nitchakarn Suppattarachai, Leandro von Werra, Harm de Vries, Qian Liu, Niklas Muennighoff
  • The high cost of full-parameter fine-tuning (FFT) of Large Language Models (LLMs) has led to a series of parameter-efficient fine-tuning (PEFT) methods. However, it remains unclear which methods provide the best cost-performance trade-off at different model scales. We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters. Through investigations across 5 tasks and 8 different datasets encompassing both code comprehension and code generation tasks, we find that FFT generally leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale. LoRA usually offers the most favorable trade-off between cost and performance. Further investigation into the effects of these methods on both model robustness and code security reveals that larger models tend to demonstrate reduced robustness and less security. At last, we explore the relationships among updated parameters, cross-entropy loss, and task performance. We find that the tuning effectiveness observed in small models generalizes well to larger models, and the validation loss in instruction tuning can be a reliable indicator of overall downstream performance.

In this session, our readings cover:

Required Readings:

Aligning Large Language Models with Human: A Survey

  • https://arxiv.org/abs/2307.12966
  • https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo
  • https://huggingface.co/blog/stackllama

More readings

Github Awesome-RLHF

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

  • https://arxiv.org/abs/2301.13688
  • We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at this https URL.

DPO Direct Preference Optimization: Your Language Model is Secretly a Reward Model

  • https://arxiv.org/abs/2305.18290
  • https://huggingface.co/blog/dpo-trl
  • While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss. The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods. Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.

Training language models to follow instructions with human feedback

  • https://arxiv.org/abs/2203.02155)
  • “further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”

Deep reinforcement learning from human preferences

  • https://openreview.net/forum?id=GisHNaleWiA
  • “explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”


[6]: alphago

Table of readings


Presenter Papers Paper URL Our Slides
Anant The Predictron: End-to-End Learning and Planning, ICLR17 1 PDF PDF
ChaoJiang Szepesvari - Theory of RL 2 RLSS.pdf + Video PDF
GaoJi Mastering the game of Go without human knowledge / Nature 2017 3 PDF PDF
  Thomas - Safe Reinforcement Learning RLSS17.pdf + video  
  Sutton - Temporal-Difference Learning RLSS17.pdf + Video  


[7]: amortized

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF


[8]: analysis

Table of readings



[9]: ape

Table of readings


In this session, our readings cover:

Required Readings:

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

  • https://arxiv.org/abs/2310.14735
  • Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu / This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting, as well as more advanced methodologies such as the chain-of-thought and tree-of-thoughts prompting. The paper sheds light on how external assistance in the form of plugins can assist in this task, and reduce machine hallucination by retrieving external knowledge. We subsequently delineate prospective directions in prompt engineering research, emphasizing the need for a deeper understanding of structures and the role of agents in Artificial Intelligence-Generated Content (AIGC) tools. We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods. Finally, we gather information about the application of prompt engineering in such fields as education and programming, showing its transformative potential. This comprehensive survey aims to serve as a friendly guide for anyone venturing through the big world of LLMs and prompt engineering.

More Readings:

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

  • This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One of the major causes of the high generation latency is the sequential decoding approach adopted by almost all state-of-the-art LLMs. In this work, motivated by the thinking and writing process of humans, we propose Skeleton-of-Thought (SoT), which first guides LLMs to generate the skeleton of the answer, and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point in parallel. Not only does SoT provide considerable speed-ups across 12 LLMs, but it can also potentially improve the answer quality on several question categories. SoT is an initial attempt at data-centric optimization for inference efficiency, and further underscores the potential of pushing LLMs to think more like a human for answer quality.

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts

  • The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models’ (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM’s capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and others parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques.


[10]: architecture-search

Table of readings


Presenter Papers Paper URL Our Slides
GaoJi Neural Architecture Search with Reinforcement Learning, ICLR17 1 PDF PDF
Ceyer Learning to learn 2 DLSS17video PDF
Beilun Optimization as a Model for Few-Shot Learning, ICLR17 3 PDF + More PDF
Anant Neural Optimizer Search with Reinforcement Learning, ICML17 4 PDF PDF

Presenter Papers Paper URL Our Slides
Anant AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17 1 PDF PDF
Shijia SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, ICML17 2 PDF PDF
Jack Proximal Deep Structured Models, NIPS16 3 PDF PDF
  Optimal Architectures in a Solvable Model of Deep Networks, NIPS16 4 PDF  
Tianlu Large-Scale Evolution of Image Classifiers, ICML17 5 PDF PDF

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep Learning Transferable Architectures for Scalable Image Recognition PDF PDF
Arshdeep FractalNet: Ultra-Deep Neural Networks without Residuals PDF PDF

Presenter Papers Paper URL Our Slides
GaoJi Forward and Reverse Gradient-Based Hyperparameter Optimization, ICML17 1 PDF PDF
Chaojiang Adaptive Neural Networks for Efficient Inference, ICML17 2 PDF PDF
Bargav Practical Gauss-Newton Optimisation for Deep Learning, ICML17 3 PDF PDF
Rita How to Escape Saddle Points Efficiently, ICML17 4 PDF PDF
  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning PDF  


[11]: associative

Table of readings


Presenter Papers Paper URL Our Slides
Beilun Learning Deep Parsimonious Representations, NIPS16 1 PDF PDF
Jack Dense Associative Memory for Pattern Recognition, NIPS16 2 PDF + video PDF


[12]: attention

Table of readings


Index Papers Our Slides
0 A survey on Interpreting Deep Learning Models Eli Survey
  Interpretable Machine Learning: Definitions,Methods, Applications Arsh Survey
1 Explaining Explanations: Axiomatic Feature Interactions for Deep Networks Arsh Survey
2 Shapley Value review Arsh Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Bill Survey
  Consistent Individualized Feature Attribution for Tree Ensembles bill Survey
  Summary for A value for n-person games Pan Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Rishab Survey
3 Hierarchical Interpretations of Neural Network Predictions Arsh Survey
  Hierarchical Interpretations of Neural Network Predictions Rishab Survey
4 Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Arsh Survey
  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Rishab Survey
5 Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models Rishab Survey
    Sanchit Survey
  Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Sanchit Survey
6 This Looks Like That: Deep Learning for Interpretable Image Recognition Pan Survey
7 AllenNLP Interpret Rishab Survey
8 DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs Rishab Survey
9 How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations Rishab Survey
10 Attention is not Explanation Sanchit Survey
    Pan Survey
11 Axiomatic Attribution for Deep Networks Sanchit Survey
12 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Sanchit Survey
13 Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier Sanchit Survey
14 “Why Should I Trust You?”Explaining the Predictions of Any Classifier Yu Survey
15 INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE Pan Survey

Presenter Papers Paper URL Our Slides
Understand Faithful and Customizable Explanations of Black Box Models Pdf Derrick PDF
Understand A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 Pdf GaoJi PDF + Bill Pdf
Understand How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Pdf + Pdf GaoJi PDF
Understand Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks Pdf + PDF GaoJi PDF
Understand Attention is not Explanation, 2019 PDF  
Understand Understanding attention in graph neural networks, 2019 PDF  

Presenter Papers Paper URL Our Slides
Bio KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 1 Pdf Eli Pdf
Bio Molecular geometry prediction using a deep generative graph neural network Pdf Eli Pdf
Bio Visualizing convolutional neural network protein-ligand scoring PDF() Eli PDF
Bio Deep generative models of genetic variation capture mutation effects PDF() Eli PDF
Bio Attentive cross-modal paratope prediction Pdf Eli PDF

Presenter Papers Paper URL Our Slides
Chao Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification PDF PDF
Jack FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning PDF PDF
BasicMLC Multi-Label Classification: An Overview PDF  
SPEN Structured Prediction Energy Networks PDF  
InfNet Learning Approximate Inference Networks for Structured Prediction PDF  
SPENMLC Deep Value Networks PDF  
Adversarial Semantic Segmentation using Adversarial Networks PDF  
EmbedMLC StarSpace: Embed All The Things! PDF  
deepMLC CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016 PDF  
deepMLC Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018 PDF  

Presenter Papers Paper URL Our Slides
Arshdeep Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 1 PDF PDF
Arshdeep Latent Alignment and Variational Attention 2 PDF PDF
Arshdeep Modularity Matters: Learning Invariant Relational Reasoning Tasks, Jason Jo, Vikas Verma, Yoshua Bengio 3 PDF PDF

Presenter Papers Paper URL Our Slides
ChaoJiang Courville - Generative Models II DLSS17Slide + video PDF
GaoJi Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 1 PDF + talk PDF
Arshdeep Composing graphical models with neural networks for structured representations and fast inference, NIPS16 2 PDF PDF
  Johnson - Graphical Models and Deep Learning DLSSSlide + video  
  Parallel Multiscale Autoregressive Density Estimation, ICML17 3 PDF  
Beilun Conditional Image Generation with Pixel CNN Decoders, NIPS16 4 PDF PDF
Shijia Marrying Graphical Models & Deep Learning DLSS17 + Video PDF

Presenter Papers Paper URL Our Slides
Jack Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 1 PDF PDF
Arshdeep Bidirectional Attention Flow for Machine Comprehension, ICLR17 2 PDF + code PDF
Ceyer Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17 PDF + code PDF
ChaoJiang Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 3 PDF PDF
  An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17 PDF  

Presenter Papers Paper URL Our Slides
Rita Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 1 PDF PDF
Tianlu Dynamic Coattention Networks For Question Answering, ICLR17 2 PDF + code PDF
ChaoJiang Structured Attention Networks, ICLR17 3 PDF + code PDF

Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  

Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  


[13]: attribution

Table of readings


Team INDEX Title & Link Tags Our Slide
T3 Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints submodular, coreset, safety OurSlide
T6 Decision Boundary Analysis of Adversarial Examples adversarial-examples OurSlide
T8 Robustness may be at odds with accuracy robustness OurSlide
T18 Towards Reverse-Engineering Black-Box Neural Networks meta, model-as-sample, safety, privacy OurSlide
T23 The Odds are Odd: A Statistical Test for Detecting Adversarial Examples adversarial-examples OurSlide
T25 Learning how to explain neural networks: PatternNet and PatternAttribution Attribution, Interpretable OurSlide
T31 Detecting Statistical Interactions from Neural Network Weights Interpretable, Relational OurSlide

Presenter Papers Paper URL Our Slides
Jack A Unified Approach to Interpreting Model Predictions PDF PDF
Jack “Why Should I Trust You?”: Explaining the Predictions of Any Classifier PDF PDF
Jack Visual Feature Attribution using Wasserstein GANs PDF PDF
Jack GAN Dissection: Visualizing and Understanding Generative Adversarial Networks PDF PDF
GaoJi Recent Interpretable machine learning papers PDF PDF
Jennifer The Building Blocks of Interpretability PDF PDF

Presenter Papers Paper URL Our Slides
Rita Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17 1 PDF PDF
Arshdeep Axiomatic Attribution for Deep Networks, ICML17 2 PDF PDF
  The Robustness of Estimator Composition, NIPS16 PDF  

Presenter Papers Paper URL Our Slides
Rita Learning Important Features Through Propagating Activation Differences, ICML17 1 PDF PDF
GaoJi Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16 2 PDF PDF
Rita Learning Kernels with Random Features, Aman Sinha*; John Duchi, 3 PDF PDF


[14]: autoencoder

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Tkach Boundary-Seeking Generative Adversarial Networks PDF PDF
Tkach Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF PDF
Tkach Generating Sentences from a Continuous Space PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF


[15]: autoregressive

Table of readings


Presenter Papers Paper URL Our Slides
ChaoJiang Courville - Generative Models II DLSS17Slide + video PDF
GaoJi Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 1 PDF + talk PDF
Arshdeep Composing graphical models with neural networks for structured representations and fast inference, NIPS16 2 PDF PDF
  Johnson - Graphical Models and Deep Learning DLSSSlide + video  
  Parallel Multiscale Autoregressive Density Estimation, ICML17 3 PDF  
Beilun Conditional Image Generation with Pixel CNN Decoders, NIPS16 4 PDF PDF
Shijia Marrying Graphical Models & Deep Learning DLSS17 + Video PDF


[16]: auxiliary

Table of readings


Presenter Papers Paper URL Our Slides
Ceyer Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 1 PDF PDF
Beilun Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy 2 PDF PDF
Ji Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 3 PDF PDF
Xueying End-to-End Differentiable Adversarial Imitation Learning, ICML17 4 PDF PDF
  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17 PDF  
  FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 5 PDF  


[17]: backprop

Table of readings


Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  


[18]: basicllm

Table of readings


In this session, our readings cover:

Require Readings:

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

  • https://arxiv.org/abs/2312.15234
  • In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

  • https://arxiv.org/abs/2304.01373
  • How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at \url{this https URL}.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

  • https://arxiv.org/abs/2403.09611
  • Multimodal LLM Pre-training - provides a comprehensive overview of methods, analysis, and insights into multimodal LLM pre-training; studies different architecture components and finds that carefully mixing image-caption, interleaved image-text, and text-only data is key for state-of-the-art performance; it also proposes a family of multimodal models up to 30B parameters that achieve SOTA in pre-training metrics and include properties such as enhanced in-context learning, multi-image reasoning, enabling few-shot chain-of-thought prompting.

More Readings:

Sparks of Large Audio Models: A Survey and Outlook

  • Siddique Latif, Moazzam Shoukat, Fahad Shamshad, Muhammad Usama, Yi Ren, Heriberto Cuayáhuitl, Wenwu Wang, Xulong Zhang, Roberto Togneri, Erik Cambria, Björn W. Schuller
  • This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources–from human voices to musical instruments and environmental sounds–poses challenges distinct from those found in traditional Natural Language Processing scenarios. Nevertheless, \textit{Large Audio Models}, epitomized by transformer-based architectures, have shown marked efficacy in this sphere. By leveraging massive amount of data, these models have demonstrated prowess in a variety of audio tasks, spanning from Automatic Speech Recognition and Text-To-Speech to Music Generation, among others. Notably, recently these Foundational Audio Models, like SeamlessM4T, have started showing abilities to act as universal translators, supporting multiple speech tasks for up to 100 languages without any reliance on separate task-specific systems. This paper presents an in-depth analysis of state-of-the-art methodologies regarding \textit{Foundational Large Audio Models}, their performance benchmarks, and their applicability to real-world scenarios. We also highlight current limitations and provide insights into potential future research directions in the realm of \textit{Large Audio Models} with the intent to spark further discussion, thereby fostering innovation in the next generation of audio-processing systems. Furthermore, to cope with the rapid development in this area, we will consistently update the relevant repository with relevant recent articles and their open-source implementations at this https URL.

In this session, our readings cover:

Required Readings:

Mistral 7B

  • https://mistral.ai/news/announcing-mistral-7b/
  • We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B – Instruct, that surpasses the Llama 2 13B – Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

More Readings:

OLMo: Accelerating the Science of Language Models

  • https://arxiv.org/abs/2402.00838

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, this technical report details the first release of OLMo, a state-of-the-art, truly Open Language Model and its framework to build and study the science of language modeling. Unlike most prior efforts that have only released model weights and inference code, we release OLMo and the whole framework, including training data and training and evaluation code. We hope this release will empower and strengthen the open research community and inspire a new wave of innovation.

Mixtral of Experts

  • https://arxiv.org/abs/2401.04088
  • We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.

- Llama 2: Open Foundation and Fine-Tuned Chat Models

  • In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  • https://arxiv.org/abs/2101.00027
  • Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets – both existing and newly constructed – many of which derive from academic or professional sources. Our evaluation of the untuned performance of GPT-2 and GPT-3 on the Pile shows that these models struggle on many of its components, such as academic writing. Conversely, models trained on the Pile improve significantly over both Raw CC and CC-100 on all components of the Pile, while improving performance on downstream evaluations. Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction.

In this session, our readings cover:

Readings:

ChatGPT is not all you need. A State of the Art Review of large Generative AI models

  • Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchan
  • https://arxiv.org/abs/2301.04655
  • During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as being a general question and answering system or automatically creating artistic images that are revolutionizing several sectors. Consequently, the implications that these generative models have in the industry and society are enormous, as several job positions may be transformed. For example, Generative AI is capable of transforming effectively and creatively texts to images, like the DALLE-2 model; text to 3D images, like the Dreamfusion model; images to text, like the Flamingo model; texts to video, like the Phenaki model; texts to audio, like the AudioLM model; texts to other texts, like ChatGPT; texts to code, like the Codex model; texts to scientific texts, like the Galactica model or even create algorithms like AlphaTensor. This work consists on an attempt to describe in a concise way the main models are sectors that are affected by generative AI and to provide a taxonomy of the main generative models published recently.

A Survey of Large Language Models

  • https://arxiv.org/abs/2303.18223
  • Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

On the Opportunities and Risks of Foundation Models

  • https://arxiv.org/abs/2108.07258
  • ” a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations).”

Required Readings:

Emergent Abilities of Large Language Models

  • URL
  • “an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”

Language Models are Few-Shot Learners

  • URL
  • “GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”

Extra Readings:

A survey of Generative AI Applications

  • https://arxiv.org/abs/2306.02781
  • Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.

Generative AI: Perspectives from Stanford HAI

  • https://hai.stanford.edu/generative-ai-perspectives-stanford-hai

Readings:

Basics of ML and DL:

Basics of NLP

  • URL
  • Typical NLP tasks / Challenges / Pipeline
  • f() on natural language
    • Before Deep NLP (Pre 2012) • (BOW / LSI / Topic Modeling LDA )
    • Word2Vec (2013-2016) • (GloVe/ FastText)
    • Recurrent NN (2014-2016) • LSTM
    • Seq2Seq
    • Attention
    • Self-Attention (2016 – now )
    • Transformer (attention only Seq2Seq)
    • BERT / RoBERTa/ XLNet/ GPT / …
  • A good code walk through on transformer at URL


[19]: beam

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF


[20]: bert

Table of readings


Team INDEX Title & Link Tags Our Slide
T11 Parameter-Efficient Transfer Learning for NLP meta, BERT, text, Transfer OurSlide
T22 Deep Asymmetric Multi-task Feature Learning meta, regularization, Multi-task OurSlide

Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  


[21]: bias

Table of readings


In this session, our readings cover:

Required Readings:

Evaluating and Mitigating Discrimination in Language Model Decisions

  • https://arxiv.org/abs/2312.03689
  • As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks. We present a method for proactively evaluating the potential discriminatory impact of LMs in a wide range of use cases, including hypothetical use cases where they have not yet been deployed. Specifically, we use an LM to generate a wide array of potential prompts that decision-makers may input into an LM, spanning 70 diverse decision scenarios across society, and systematically vary the demographic information in each prompt. Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied. While we do not endorse or permit the use of language models to make automated decisions for the high-risk use cases we study, we demonstrate techniques to significantly decrease both positive and negative discrimination through careful prompt engineering, providing pathways toward safer deployment in use cases where they may be appropriate. Our work enables developers and policymakers to anticipate, measure, and address discrimination as language model capabilities and applications continue to expand. We release our dataset and prompts at this https URL

More Readings:

Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

  • https://arxiv.org/abs/2310.11079

Machine Learning in development: Let’s talk about bias!

  • https://huggingface.co/blog/ethics-soc-2
  • https://huggingface.co/blog/evaluating-llm-bias

Exploring Social Bias in Chatbots using Stereotype Knowledge WNLP@ACL2019

Bias and Fairness in Large Language Models: A Survey

  • https://arxiv.org/abs/2309.00770
  • Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

A Survey on Fairness in Large Language Models

  • https://arxiv.org/abs/2308.10149
  • Large language models (LLMs) have shown powerful performance and development prospect and are widely deployed in the real world. However, LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks. Unfair LLM systems have undesirable social impacts and potential harms. In this paper, we provide a comprehensive review of related research on fairness in LLMs. First, for medium-scale LLMs, we introduce evaluation metrics and debiasing methods from the perspectives of intrinsic bias and extrinsic bias, respectively. Then, for large-scale LLMs, we introduce recent fairness research, including fairness evaluation, reasons for bias, and debiasing methods. Finally, we discuss and provide insight on the challenges and future directions for the development of fairness in LLMs.

</div> </div> ---

Table of readings


Index Papers Our Slides
1 BIAS ALSO MATTERS: BIAS ATTRIBUTION FOR DEEP NEURAL NETWORK EXPLANATION Arsh Survey
2 Data Shapley: Equitable Valuation of Data for Machine Learning Arsh Survey
  What is your data worth? Equitable Valuation of Data Sanchit Survey
3 Neural Network Attributions: A Causal Perspective Zhe Survey
4 Defending Against Neural Fake News Eli Survey
5 Interpretation of Neural Networks is Fragile Eli Survey
  Interpretation of Neural Networks is Fragile Pan Survey
6 Parsimonious Black-Box Adversarial Attacks Via Efficient Combinatorial Optimization Eli Survey
7 Retrofitting Word Vectors to Semantic Lexicons Morris Survey
8 On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Morris Survey
9 Towards Deep Learning Models Resistant to Adversarial Attacks Pan Survey
10 Robust Attribution Regularization Pan Survey
11 Sanity Checks for Saliency Maps Sanchit Survey
12 Survey of data generation and evaluation in Interpreting DNN pipelines Sanchit Survey
13 Think Architecture First: Benchmarking Deep Learning Interpretability in Time Series Predictions Sanchit Survey
14 Universal Adversarial Triggers for Attacking and Analyzing NLP Sanchit Survey
15 Apricot: Submodular selection for data summarization in Python Arsh Survey
---

[22]: bias-variance

Table of readings


Presenter Papers Paper URL Our Slides
NIPS16 Andrew Ng - Nuts and Bolts of Applying Deep Learning: 1 video    
DLSS17 Doina Precup - Machine Learning - Bayesian Views (56:50m to 1:04:45 slides) video + slide    
---

[23]: binarization

Table of readings


Team INDEX Title & Link Tags Our Slide
T33 The High-Dimensional Geometry of Binary Neural Networks Quantization, binarization, scalable OurSlide
T34 Modern Neural Networks Generalize on Small Data Sets small-data, analysis, ensemble OurSlide
T4 Cognitive Scheduler for Heterogeneous High Performance Computing System system-application OurSlide
---

[24]: binary

Table of readings


Presenter Papers Paper URL Our Slides
Edge MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications PDF  
Edge XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks URL Ryan PDF
Edge DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices Pdf Eamon PDF
Edge Loss-aware Binarization of Deep Networks, ICLR17 PDF Ryan PDF
Edge Espresso: Efficient Forward Propagation for Binary Deep Neural Networks Pdf Eamon PDF
Dynamic Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution PDF Weilin PDF
Dynamic Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems PDF  
Dynamic Cavs: An Efficient Runtime System for Dynamic Neural Networks Pdf  

Presenter Papers Paper URL Our Slides
Robust Adversarial Attacks on Graph Structured Data Pdf Faizan [PDF + GaoJi Pdf
Robust KDD’18 Adversarial Attacks on Neural Networks for Graph Data Pdf Faizan PDF + GaoJi Pdf
Robust Attacking Binarized Neural Networks Pdf Faizan PDF

Presenter Papers Paper URL Our Slides
Arshdeep Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction 1 PDF PDF
Arshdeep Decoupled Neural Interfaces Using Synthetic Gradients 2 PDF PDF
Arshdeep Diet Networks: Thin Parameters for Fat Genomics 3 PDF PDF
Arshdeep Metric Learning with Adaptive Density Discrimination 4 PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep HyperNetworks, David Ha, Andrew Dai, Quoc V. Le ICLR 2017 1 PDF PDF
Arshdeep Learning feed-forward one-shot learners 2 PDF PDF
Arshdeep Learning to Learn by gradient descent by gradient descent 3 PDF PDF
Arshdeep Dynamic Filter Networks 4 https://arxiv.org/abs/1605.09673 PDF PDF

Presenter Papers Paper URL Our Slides
DeepBind Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning PDF  
DeepSEA Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk PDF  
DeepSEA Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014    
BioBasics A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13    
BioBasics Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011    
BioBasics Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004    
BioBasics NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels    
MedSignal Segmenting Time Series: A Survey and Novel Approach, PDF  

Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[25]: black-box

Table of readings


Index Papers Our Slides
0 A survey on Interpreting Deep Learning Models Eli Survey
  Interpretable Machine Learning: Definitions,Methods, Applications Arsh Survey
1 Explaining Explanations: Axiomatic Feature Interactions for Deep Networks Arsh Survey
2 Shapley Value review Arsh Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Bill Survey
  Consistent Individualized Feature Attribution for Tree Ensembles bill Survey
  Summary for A value for n-person games Pan Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Rishab Survey
3 Hierarchical Interpretations of Neural Network Predictions Arsh Survey
  Hierarchical Interpretations of Neural Network Predictions Rishab Survey
4 Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Arsh Survey
  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Rishab Survey
5 Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models Rishab Survey
    Sanchit Survey
  Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Sanchit Survey
6 This Looks Like That: Deep Learning for Interpretable Image Recognition Pan Survey
7 AllenNLP Interpret Rishab Survey
8 DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs Rishab Survey
9 How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations Rishab Survey
10 Attention is not Explanation Sanchit Survey
    Pan Survey
11 Axiomatic Attribution for Deep Networks Sanchit Survey
12 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Sanchit Survey
13 Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier Sanchit Survey
14 “Why Should I Trust You?”Explaining the Predictions of Any Classifier Yu Survey
15 INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE Pan Survey

Presenter Papers Paper URL Our Slides
Understand Faithful and Customizable Explanations of Black Box Models Pdf Derrick PDF
Understand A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 Pdf GaoJi PDF + Bill Pdf
Understand How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Pdf + Pdf GaoJi PDF
Understand Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks Pdf + PDF GaoJi PDF
Understand Attention is not Explanation, 2019 PDF  
Understand Understanding attention in graph neural networks, 2019 PDF  

Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  

Presenter Papers Paper URL Our Slides
SE Equivariance Through Parameter-Sharing, ICML17 1 PDF  
SE Why Deep Neural Networks for Function Approximation?, ICLR17 2 PDF  
SE Geometry of Neural Network Loss Surfaces via Random Matrix Theory, 3ICML17 PDF  
  Sharp Minima Can Generalize For Deep Nets, ICML17 4 PDF  

Presenter Papers Paper URL Our Slides
Ceyer A Closer Look at Memorization in Deep Networks, ICML17 1 PDF PDF
  On the Expressive Efficiency of Overlapping Architectures of Deep Learning 2 DLSSpdf + video  
Mutual Information Opening the Black Box of Deep Neural Networks via Information 3 URL + video  
ChaoJiang Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16 PDF PDF

Presenter Papers Paper URL Our Slides
Beilun Learning Deep Parsimonious Representations, NIPS16 1 PDF PDF
Jack Dense Associative Memory for Pattern Recognition, NIPS16 2 PDF + video PDF

Presenter Papers Paper URL Our Slides
Rita On the Expressive Power of Deep Neural Networks 1 PDF PDF
Arshdeep Understanding deep learning requires rethinking generalization, ICLR17 2 PDF PDF
Tianlu On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR17 3 PDF PDF

Presenter Papers Paper URL Our Slides
GaoJi A few useful things to know about machine learning PDF PDF
GaoJi A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions PDF PDF
GaoJi Automated White-box Testing of Deep Learning Systems 1 PDF PDF
GaoJi Testing and Validating Machine Learning Classifiers by Metamorphic Testing 2 PDF PDF
GaoJi Software testing: a research travelogue (2000–2014) PDF PDF
---

[26]: blocking

Table of readings


Presenter Papers Paper URL Our Slides
Shijia Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 1 PDF PDF
Ceyer Sequence Modeling via Segmentations, ICML17 2 PDF PDF
Arshdeep Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 3 PDF PDF
---

[27]: brain

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 PDF PDF
Arshdeep The CRISPR tool kit for genome editing and beyond, Mazhar Adli PDF PDF
Eric Intro of Genetic Engineering PDF PDF
Eric Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs PDF PDF
Brandon Generative Modeling for Protein Structure URL PDF

Presenter Papers Paper URL Our Slides
Arshdeep DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. PDF PDF
Arshdeep Solving the RNA design problem with reinforcement learning, PLOSCB 1 PDF PDF
Arshdeep Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk 2 PDF PDF
Arshdeep Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio 3 PDF PDF
Brandon Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics PDF PDF
Arshdeep Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions 2 PDF PDF

Ganguli - Theoretical Neuroscience and Deep Learning

Presenter Papers Paper URL Our Slides
DLSS16 video    
DLSS17 video + slide    
DLSS17 Deep learning in the brain DLSS17 + Video  
---

[28]: casual

Table of readings


Index Papers Our Slides
1 A Flexible Generative Framework for Graph-based Semi-supervised Learning Arsh Survey
2 Learning Discrete Structures for Graph Neural Networks Arsh Survey
4 Graph Markov Neural Nets Arsh Survey
  Graph Markov Neural Networks Jack Survey
5 GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations Arsh Survey
6 Subgraph Neural Networks Arsh Survey
7 Pointer Graph Networks Arsh Survey
8 Modeling Relational Data with Graph Convolutional Networks Arsh Survey
9 Graph Learning Zhe Survey
8 Neural Relational Inference Zhe Survey
---

Table of readings


Index Papers Our Slides
0 A survey on Interpreting Deep Learning Models Eli Survey
  Interpretable Machine Learning: Definitions,Methods, Applications Arsh Survey
1 Explaining Explanations: Axiomatic Feature Interactions for Deep Networks Arsh Survey
2 Shapley Value review Arsh Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Bill Survey
  Consistent Individualized Feature Attribution for Tree Ensembles bill Survey
  Summary for A value for n-person games Pan Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Rishab Survey
3 Hierarchical Interpretations of Neural Network Predictions Arsh Survey
  Hierarchical Interpretations of Neural Network Predictions Rishab Survey
4 Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Arsh Survey
  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Rishab Survey
5 Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models Rishab Survey
    Sanchit Survey
  Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Sanchit Survey
6 This Looks Like That: Deep Learning for Interpretable Image Recognition Pan Survey
7 AllenNLP Interpret Rishab Survey
8 DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs Rishab Survey
9 How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations Rishab Survey
10 Attention is not Explanation Sanchit Survey
    Pan Survey
11 Axiomatic Attribution for Deep Networks Sanchit Survey
12 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Sanchit Survey
13 Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier Sanchit Survey
14 “Why Should I Trust You?”Explaining the Predictions of Any Classifier Yu Survey
15 INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE Pan Survey

Presenter Papers Paper URL Our Slides
Understand Faithful and Customizable Explanations of Black Box Models Pdf Derrick PDF
Understand A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 Pdf GaoJi PDF + Bill Pdf
Understand How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Pdf + Pdf GaoJi PDF
Understand Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks Pdf + PDF GaoJi PDF
Understand Attention is not Explanation, 2019 PDF  
Understand Understanding attention in graph neural networks, 2019 PDF  
---

[29]: certified-defense

Table of readings


Presenter Papers Paper URL Our Slides
Bill Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples PDF PDF
Bill Adversarial Examples for Evaluating Reading Comprehension Systems, Robin Jia, Percy Liang PDF PDF
Bill Certified Defenses against Adversarial Examples, Aditi Raghunathan, Jacob Steinhardt, Percy Liang PDF PDF
Bill Provably Minimally-Distorted Adversarial Examples, Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill PDF PDF

Presenter Papers Paper URL Our Slides
AE Intriguing properties of neural networks / PDF  
AE Explaining and Harnessing Adversarial Examples PDF  
AE Towards Deep Learning Models Resistant to Adversarial Attacks PDF  
AE DeepFool: a simple and accurate method to fool deep neural networks PDF  
AE Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner PDF PDF
Data Basic Survey of ImageNet - LSVRC competition URL PDF
Understand Understanding Black-box Predictions via Influence Functions PDF  
Understand Deep inside convolutional networks: Visualising image classification models and saliency maps PDF  
Understand BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1] PDF  
provable Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter, URL  
---

[30]: chromatin

Table of readings


Index Papers Our Slides
1 Protein 3D Structure Computed from Evolutionary Sequence Variation Arsh Survey
3 Regulatory network inference on developmental and evolutionary lineages Arsh Survey
4 Deep learning in ultrasound image analysis Zhe Survey
5 Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (DeepBind) Jack Survey
6 Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors Jack Survey
7 BindSpace decodes transcription factor binding signals by large-scale sequence embedding Jack Survey
8 FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Jack Survey
9 Query-Reduction Networks for Question Answering Bill Survey
---

[31]: cnn

Table of readings

---

[32]: composition

Table of readings


Presenter Papers Paper URL Our Slides
ChaoJiang Courville - Generative Models II DLSS17Slide + video PDF
GaoJi Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 1 PDF + talk PDF
Arshdeep Composing graphical models with neural networks for structured representations and fast inference, NIPS16 2 PDF PDF
  Johnson - Graphical Models and Deep Learning DLSSSlide + video  
  Parallel Multiscale Autoregressive Density Estimation, ICML17 3 PDF  
Beilun Conditional Image Generation with Pixel CNN Decoders, NIPS16 4 PDF PDF
Shijia Marrying Graphical Models & Deep Learning DLSS17 + Video PDF

Presenter Papers Paper URL Our Slides
Tianlu Robustness of classifiers: from adversarial to random noise, NIPS16 PDF 1 PDF
Anant Blind Attacks on Machine Learners, 2 NIPS16 PDF PDF
  Data Noising as Smoothing in Neural Network Language Models (Ng), ICLR17 3 pdf  
  The Robustness of Estimator Composition, NIPS16 4 PDF  

Presenter Papers Paper URL Our Slides
Rita Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17 1 PDF PDF
Arshdeep Axiomatic Attribution for Deep Networks, ICML17 2 PDF PDF
  The Robustness of Estimator Composition, NIPS16 PDF  
---

[33]: compression

Table of readings


Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[34]: concept

Table of readings


Index Papers Our Slides
0 A survey on Interpreting Deep Learning Models Eli Survey
  Interpretable Machine Learning: Definitions,Methods, Applications Arsh Survey
1 Explaining Explanations: Axiomatic Feature Interactions for Deep Networks Arsh Survey
2 Shapley Value review Arsh Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Bill Survey
  Consistent Individualized Feature Attribution for Tree Ensembles bill Survey
  Summary for A value for n-person games Pan Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Rishab Survey
3 Hierarchical Interpretations of Neural Network Predictions Arsh Survey
  Hierarchical Interpretations of Neural Network Predictions Rishab Survey
4 Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Arsh Survey
  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Rishab Survey
5 Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models Rishab Survey
    Sanchit Survey
  Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Sanchit Survey
6 This Looks Like That: Deep Learning for Interpretable Image Recognition Pan Survey
7 AllenNLP Interpret Rishab Survey
8 DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs Rishab Survey
9 How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations Rishab Survey
10 Attention is not Explanation Sanchit Survey
    Pan Survey
11 Axiomatic Attribution for Deep Networks Sanchit Survey
12 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Sanchit Survey
13 Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier Sanchit Survey
14 “Why Should I Trust You?”Explaining the Predictions of Any Classifier Yu Survey
15 INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE Pan Survey
---

[35]: crispr

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 PDF PDF
Arshdeep The CRISPR tool kit for genome editing and beyond, Mazhar Adli PDF PDF
Eric Intro of Genetic Engineering PDF PDF
Eric Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs PDF PDF
Brandon Generative Modeling for Protein Structure URL PDF
---

[36]: cryptography

Table of readings


Presenter Papers Paper URL Our Slides
Tobin Summary of A few Papers on: Machine Learning and Cryptography, (e.g., learning to Protect Communications with Adversarial Neural Cryptography) 1 PDF PDF
Tobin Privacy Aware Learning (NIPS12) 2 PDF PDF
Tobin Can Machine Learning be Secure?(2006) PDF PDF
---

[37]: curriculum

Table of readings


Presenter Papers Paper URL Our Slides
Ceyer An overview of gradient optimization algorithms, 1 PDF PDF
Shijia Osborne - Probabilistic numerics for deep learning 2 DLSS 2017 + Video PDF / PDF2
Jack Automated Curriculum Learning for Neural Networks, ICML17 3 PDF PDF
DLSS17 Johnson - Automatic Differentiation 4 slide + video  
---

Table of readings


Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  
---

[38]: data-valuation

Table of readings


Index Papers Our Slides
1 BIAS ALSO MATTERS: BIAS ATTRIBUTION FOR DEEP NEURAL NETWORK EXPLANATION Arsh Survey
2 Data Shapley: Equitable Valuation of Data for Machine Learning Arsh Survey
  What is your data worth? Equitable Valuation of Data Sanchit Survey
3 Neural Network Attributions: A Causal Perspective Zhe Survey
4 Defending Against Neural Fake News Eli Survey
5 Interpretation of Neural Networks is Fragile Eli Survey
  Interpretation of Neural Networks is Fragile Pan Survey
6 Parsimonious Black-Box Adversarial Attacks Via Efficient Combinatorial Optimization Eli Survey
7 Retrofitting Word Vectors to Semantic Lexicons Morris Survey
8 On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Morris Survey
9 Towards Deep Learning Models Resistant to Adversarial Attacks Pan Survey
10 Robust Attribution Regularization Pan Survey
11 Sanity Checks for Saliency Maps Sanchit Survey
12 Survey of data generation and evaluation in Interpreting DNN pipelines Sanchit Survey
13 Think Architecture First: Benchmarking Deep Learning Interpretability in Time Series Predictions Sanchit Survey
14 Universal Adversarial Triggers for Attacking and Analyzing NLP Sanchit Survey
15 Apricot: Submodular selection for data summarization in Python Arsh Survey
---

[39]: denoising

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep Generalization and Equilibrium in Generative Adversarial Nets (ICML17) 1 PDF + video PDF
Arshdeep Mode Regularized Generative Adversarial Networks (ICLR17) 2 PDF PDF
Bargav Improving Generative Adversarial Networks with Denoising Feature Matching, ICLR17 3 PDF PDF
Anant Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy, ICLR17 4 PDF + code PDF
---

[40]: dialog

Table of readings


Presenter Papers Paper URL Our Slides
Jack Learning End-to-End Goal-Oriented Dialog, ICLR17 1 PDF PDF
Bargav Nonparametric Neural Networks, ICLR17 2 PDF PDF
Bargav Learning Structured Sparsity in Deep Neural Networks, NIPS16 3 PDF PDF
Arshdeep Learning the Number of Neurons in Deep Networks, NIPS16 4 PDF PDF
---

[41]: difference-analysis

Table of readings


Presenter Papers Paper URL Our Slides
Rita Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17 1 PDF PDF
Arshdeep Axiomatic Attribution for Deep Networks, ICML17 2 PDF PDF
  The Robustness of Estimator Composition, NIPS16 PDF  

Presenter Papers Paper URL Our Slides
Rita Learning Important Features Through Propagating Activation Differences, ICML17 1 PDF PDF
GaoJi Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16 2 PDF PDF
Rita Learning Kernels with Random Features, Aman Sinha*; John Duchi, 3 PDF PDF
---

[42]: differentiation

Table of readings


Presenter Papers Paper URL Our Slides
Ceyer An overview of gradient optimization algorithms, 1 PDF PDF
Shijia Osborne - Probabilistic numerics for deep learning 2 DLSS 2017 + Video PDF / PDF2
Jack Automated Curriculum Learning for Neural Networks, ICML17 3 PDF PDF
DLSS17 Johnson - Automatic Differentiation 4 slide + video  
---

[43]: diffusion

Table of readings


Stable diffusion

  • URL
  • “High-Resolution Image Synthesis with Latent Diffusion Models”

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

  • URL
  • “personalization” of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. .”

LoRA: Low-Rank Adaptation of Large Language Models

  • URL
  • “propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.”

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

  • https://arxiv.org/abs/2208.01618
  • Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
  • Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks.
---

[44]: dimension-reduction

Table of readings


Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[45]: discrete

Table of readings


Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Tkach Boundary-Seeking Generative Adversarial Networks PDF PDF
Tkach Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF PDF
Tkach Generating Sentences from a Continuous Space PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF
---

[46]: distillation

Table of readings


Presenter Papers Paper URL Our Slides
Bill Adversarial Examples that Fool both Computer Vision and Time-Limited Humans PDF PDF
Bill Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Bill TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing PDF PDF
Bill Distilling the Knowledge in a Neural Network PDF PDF
Bill Defensive Distillation is Not Robust to Adversarial Examples PDF PDF
Bill Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow PDF PDF
---

[47]: distributed

Table of readings


Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  
---

Table of readings


Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[48]: dna

Table of readings


Presenter Papers Paper URL Our Slides
Bio KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 1 Pdf Eli Pdf
Bio Molecular geometry prediction using a deep generative graph neural network Pdf Eli Pdf
Bio Visualizing convolutional neural network protein-ligand scoring PDF() Eli PDF
Bio Deep generative models of genetic variation capture mutation effects PDF() Eli PDF
Bio Attentive cross-modal paratope prediction Pdf Eli PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 PDF PDF
Arshdeep The CRISPR tool kit for genome editing and beyond, Mazhar Adli PDF PDF
Eric Intro of Genetic Engineering PDF PDF
Eric Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs PDF PDF
Brandon Generative Modeling for Protein Structure URL PDF

Presenter Papers Paper URL Our Slides
Arshdeep DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. PDF PDF
Arshdeep Solving the RNA design problem with reinforcement learning, PLOSCB 1 PDF PDF
Arshdeep Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk 2 PDF PDF
Arshdeep Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio 3 PDF PDF
Brandon Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics PDF PDF
Arshdeep Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions 2 PDF PDF

Presenter Papers Paper URL Our Slides
BrandonLiu Summary of Recent Generative Adversarial Networks (Classified)   PDF
Jack Generating and designing DNA with deep generative models, Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey PDF PDF
GaoJi More about basics of GAN   PDF
  McGan: Mean and Covariance Feature Matching GAN, PMLR 70:2527-2535 PDF  
  Wasserstein GAN, ICML17 PDF  
  Geometrical Insights for Implicit Generative Modeling, L Bottou, M Arjovsky, D Lopez-Paz, M Oquab PDF  

Presenter Papers Paper URL Our Slides
DeepBind Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning PDF  
DeepSEA Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk PDF  
DeepSEA Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014    
BioBasics A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13    
BioBasics Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011    
BioBasics Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004    
BioBasics NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels    
MedSignal Segmenting Time Series: A Survey and Novel Approach, PDF  
---

[49]: domain-adaptation

Table of readings


Presenter Papers Paper URL Our Slides
Xueying Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, ICLR17 1 PDF PDF
Bargav Deep Learning with Differential Privacy, CCS16 2 PDF + video PDF
Bargav Privacy-Preserving Deep Learning, CCS15 3 PDF PDF
Xueying Domain Separation Networks, NIPS16 4 PDF PDF
---

[50]: domainadapt

Table of readings


In this session, our readings cover:

Required Readings:

Large Language Models for Software Engineering: A Systematic Literature Review

  • Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We collect and analyze 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study.

More Readings:

Large language models generate functional protein sequences across diverse families

  • https://pubmed.ncbi.nlm.nih.gov/36702895/
  • Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase.

Large Language Models in Law: A Survey

  • https://arxiv.org/abs/2312.03718
  • The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges.

ChemLLM: A Chemical Large Language Model

  • https://arxiv.org/abs/2402.06852
  • Large language models (LLMs) have made impressive progress in chemistry applications, including molecular property prediction, molecular generation, experimental protocol design, etc. However, the community lacks a dialogue-based model specifically designed for chemistry. The challenge arises from the fact that most chemical data and scientific knowledge are primarily stored in structured databases, and the direct use of these structured data compromises the model’s ability to maintain coherent dialogue. To tackle this issue, we develop a novel template-based instruction construction method that transforms structured knowledge into plain dialogue, making it suitable for language model traini…

FunSearch: Making new discoveries in mathematical sciences using Large Language Models

  • https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

Transforming the future of music creation

  • https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/

Segment Anything

  • https://arxiv.org/abs/2304.02643
  • We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at this https URL to foster research into foundation models for computer vision.

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

  • In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues, we propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations. Experimental results demonsrate that EMO is able to produce not only convincing speaking videos but also singing videos in various styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

  • Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun
  • Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model’s background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora’s development and investigate the underlying technologies used to build this “world simulator”. Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.

BloombergGPT: A Large Language Model for Finance

  • https://arxiv.org/abs/2303.17564
  • The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg’s extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

  • https://arxiv.org/abs/2311.10709
  • We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions–adjusted noise schedules for diffusion, and multi-stage training–that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work–81% vs. Google’s Imagen Video, 90% vs. Nvidia’s PYOCO, and 96% vs. Meta’s Make-A-Video. Our model outperforms commercial solutions such as RunwayML’s Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user’s text prompt, where our generations are preferred 96% over prior work.
---

[51]: dynamic

Table of readings


Presenter Papers Paper URL Our Slides
Edge MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications PDF  
Edge XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks URL Ryan PDF
Edge DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices Pdf Eamon PDF
Edge Loss-aware Binarization of Deep Networks, ICLR17 PDF Ryan PDF
Edge Espresso: Efficient Forward Propagation for Binary Deep Neural Networks Pdf Eamon PDF
Dynamic Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution PDF Weilin PDF
Dynamic Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems PDF  
Dynamic Cavs: An Efficient Runtime System for Dynamic Neural Networks Pdf  

Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  

Presenter Papers Paper URL Our Slides
spherical Spherical CNNs Pdf Fuwen PDF + Arshdeep Pdf
dynamic Dynamic graph cnn for learning on point clouds, 2018 Pdf Fuwen PDF
basics Geometric Deep Learning (simple introduction video) URL  
matching All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks Pdf Fuwen PDF
completion Geometric matrix completion with recurrent multi-graph neural networks Pdf Fuwen PDF
Tutorial Geometric Deep Learning on Graphs and Manifolds URL Arsh PDF
matching Similarity Learning with Higher-Order Proximity for Brain Network Analysis   Arsh PDF
pairwise Pixel to Graph with Associative Embedding PDF Fuwen PDF
3D 3D steerable cnns: Learning rotationally equivariant features in volumetric data URL Fuwen PDF

Presenter Papers Paper URL Our Slides
Arshdeep Learning Transferable Architectures for Scalable Image Recognition PDF PDF
Arshdeep FractalNet: Ultra-Deep Neural Networks without Residuals PDF PDF

Presenter Papers Paper URL Our Slides
GaoJi Forward and Reverse Gradient-Based Hyperparameter Optimization, ICML17 1 PDF PDF
Chaojiang Adaptive Neural Networks for Efficient Inference, ICML17 2 PDF PDF
Bargav Practical Gauss-Newton Optimisation for Deep Learning, ICML17 3 PDF PDF
Rita How to Escape Saddle Points Efficiently, ICML17 4 PDF PDF
  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning PDF  

Presenter Papers Paper URL Our Slides
Anant AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17 1 PDF PDF
Shijia SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, ICML17 2 PDF PDF
Jack Proximal Deep Structured Models, NIPS16 3 PDF PDF
  Optimal Architectures in a Solvable Model of Deep Networks, NIPS16 4 PDF  
Tianlu Large-Scale Evolution of Image Classifiers, ICML17 5 PDF PDF

Presenter Papers Paper URL Our Slides
Tianlu Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, ICML17 1 PDF + code PDF
Jack Reasoning with Memory Augmented Neural Networks for Language Comprehension, ICLR17 2 PDF PDF
Xueying State-Frequency Memory Recurrent Neural Networks, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Rita Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 1 PDF PDF
Tianlu Dynamic Coattention Networks For Question Answering, ICLR17 2 PDF + code PDF
ChaoJiang Structured Attention Networks, ICLR17 3 PDF + code PDF

Presenter Papers Paper URL Our Slides
Arshdeep Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction 1 PDF PDF
Arshdeep Decoupled Neural Interfaces Using Synthetic Gradients 2 PDF PDF
Arshdeep Diet Networks: Thin Parameters for Fat Genomics 3 PDF PDF
Arshdeep Metric Learning with Adaptive Density Discrimination 4 PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep HyperNetworks, David Ha, Andrew Dai, Quoc V. Le ICLR 2017 1 PDF PDF
Arshdeep Learning feed-forward one-shot learners 2 PDF PDF
Arshdeep Learning to Learn by gradient descent by gradient descent 3 PDF PDF
Arshdeep Dynamic Filter Networks 4 https://arxiv.org/abs/1605.09673 PDF PDF
---

[52]: efficiency

Table of readings


KV Caching in LLM:

  • Retentive Network: A Successor to Transformer for Large Language Models: https://arxiv.org/abs/2307.08621

  • https://arxiv.org/abs/2305.13048 RWKV: Reinventing RNNs for the Transformer Era

  • grouped query attention: https://arxiv.org/pdf/2305.13245.pdf
  • Paged attention https://arxiv.org/pdf/2309.06180.pdf https://openreview.net/pdf?id=uNrFpDPMyo

Retentive Network: A Successor to Transformer for Large Language Models

  • In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation… Show more

RWKV: Reinventing RNNs for the Transformer Era

  • Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transfor… Show more

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

  • Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
  • The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 4,157 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop CUT, a state-of-the-art unlearning method based on controlling model representations. CUT reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at this https URL

Must know tools for training/finetuning/serving LLM’s -

  1. Torchtune - Build on top of Pytorch, for training and finetuning LLM’s. Uses yaml based configs for easily running experiments. Github -

  2. axolotl - Built on top on Huggigface peft and transformer library, supports fine-tuning a large number for models like Mistral, LLama etc. Provides support for techniques like RLHF, DPO, LORA, qLORA etc. Github

  3. LitGPT - Build on nanoGPT and Megatron, support pre-training and fine-tuning, has examples like Starcoder, TinyLlama etc. Github -

  4. Maxtext - Jax based library for training LLM’s on Google TPU’s with configs for models like Gemma, Mistral and LLama2 etc. Github

  5. Langchain- https://python.langchain.com/docs/get_started/introduction

  6. haystack.deepset.ai
    • https://github.com/deepset-ai/haystack
    • LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it’s best suited for building RAG, question answering, semantic search or conversational agent chatbots.
  7. LlamaIndex
    • https://docs.llamaindex.ai/en/stable/ LlamaIndex supports Retrieval-Augmented Generation (RAG). Instead of asking LLM to generate an answer immediately, LlamaIndex: retrieves information from your data sources first, / adds it to your question as context, and / asks the LLM to answer based on the enriched prompt.
  8. Making Retrieval Augmented Generation Fast
    • https://www.pinecone.io/learn/fast-retrieval-augmented-generation/
  9. OpenMoE
    • https://github.com/XueFuzhao/OpenMoE

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

  • Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, Xia Hu
  • This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current GPT- and BERT-style LLMs. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, natural language generation tasks, emergent abilities, and considerations for specific tasks.We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, .

  • https://github.com/Mooler0410/LLMsPracticalGuide

In this session, our readings cover:

Required Readings:

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

  • https://arxiv.org/abs/2311.12351
  • Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current LLMs are predominantly pretrained on short text snippets, which compromises their effectiveness in processing the long-context prompts that are frequently encountered in practical scenarios. This article offers a comprehensive survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs throughout the entire model lifecycle, from pre-training through to inference. We first delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. We then provide a taxonomy and the landscape of upgrades on Transformer architecture to solve these problems. Afterwards, we provide an investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as optimization toolkits such as libraries, frameworks, and compilers to boost the efficacy of LLMs across different stages in runtime. Finally, we discuss the challenges and potential avenues for future research. A curated repository of relevant literature, continuously updated, is available at this https URL.

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

  • Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
  • Paper: https://arxiv.org/abs/2205.14135
  • Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware – accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3$\times$ speedup on GPT-2 (seq. length 1K), and 2.4$\times$ speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).

  • Related: blogpost FlashAttention — Techniques for Efficient Inference of LLMs (III/IV)

State Space Model for New-Generation Network Alternative to Transformers: A Survey

  • [Submitted on 15 Apr 2024]
  • Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, Yaowei Wang, Yonghong Tian, Jin Tang
  • In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: this https URL.

Attention Mechanisms in Computer Vision: A Survey

  • Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu
  • https://arxiv.org/abs/2111.07624
  • Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention; a related repository this https URL is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

More readings:

JAMBA

  • Introducing Jamba: AI21’s Groundbreaking SSM-Transformer Model Debuting the first production-grade Mamba-based model delivering best-in-class quality and performance.
  • March 28, 2024
  • https://www.ai21.com/blog/announcing-jamba
  • We are thrilled to announce Jamba, the world’s first production-grade Mamba based model. By enhancing Mamba Structured State Space model (SSM) technology with elements of the traditional Transformer architecture, Jamba compensates for the inherent limitations of a pure SSM model. Offering a 256K context window, it is already demonstrating remarkable gains in throughput and efficiency—just the beginning of what can be possible with this innovative hybrid architecture. Notably, Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  • Albert Gu, Tri Dao
  • https://arxiv.org/abs/2312.00752
  • Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers’ computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

Efficient Memory Management for Large Language Model Serving with PagedAttention

  • Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
  • High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address this problem, we propose PagedAttention, an attention algorithm inspired by the classical virtual memory and paging techniques in operating systems. On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage. Our evaluations show that vLLM improves the throughput of popular LLMs by 2-4× with the same level of latency compared to the state-of-the-art systems, such as FasterTransformer and Orca. The improvement is more pronounced with longer sequences, larger models, and more complex decoding algorithms. vLLM’s source code is publicly available at this https URL

In this session, our readings cover:

Require Readings:

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

  • https://arxiv.org/abs/2312.15234
  • In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

  • https://arxiv.org/abs/2304.01373
  • How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at \url{this https URL}.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

  • https://arxiv.org/abs/2403.09611
  • Multimodal LLM Pre-training - provides a comprehensive overview of methods, analysis, and insights into multimodal LLM pre-training; studies different architecture components and finds that carefully mixing image-caption, interleaved image-text, and text-only data is key for state-of-the-art performance; it also proposes a family of multimodal models up to 30B parameters that achieve SOTA in pre-training metrics and include properties such as enhanced in-context learning, multi-image reasoning, enabling few-shot chain-of-thought prompting.

More Readings:

Sparks of Large Audio Models: A Survey and Outlook

  • Siddique Latif, Moazzam Shoukat, Fahad Shamshad, Muhammad Usama, Yi Ren, Heriberto Cuayáhuitl, Wenwu Wang, Xulong Zhang, Roberto Togneri, Erik Cambria, Björn W. Schuller
  • This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources–from human voices to musical instruments and environmental sounds–poses challenges distinct from those found in traditional Natural Language Processing scenarios. Nevertheless, \textit{Large Audio Models}, epitomized by transformer-based architectures, have shown marked efficacy in this sphere. By leveraging massive amount of data, these models have demonstrated prowess in a variety of audio tasks, spanning from Automatic Speech Recognition and Text-To-Speech to Music Generation, among others. Notably, recently these Foundational Audio Models, like SeamlessM4T, have started showing abilities to act as universal translators, supporting multiple speech tasks for up to 100 languages without any reliance on separate task-specific systems. This paper presents an in-depth analysis of state-of-the-art methodologies regarding \textit{Foundational Large Audio Models}, their performance benchmarks, and their applicability to real-world scenarios. We also highlight current limitations and provide insights into potential future research directions in the realm of \textit{Large Audio Models} with the intent to spark further discussion, thereby fostering innovation in the next generation of audio-processing systems. Furthermore, to cope with the rapid development in this area, we will consistently update the relevant repository with relevant recent articles and their open-source implementations at this https URL.

In this session, our readings cover:

Required Readings:

Scaling Laws for Neural Language Models

  • Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei
  • We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

  • https://github.com/RUCAIBox/LLMSurvey

Efficient Large Language Models: A Survey

  • https://arxiv.org/abs/2312.03863
  • https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey
  • Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency this http URL this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we compile the papers featured in this survey at this https URL, and will actively maintain this repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

  • Recent research, such as BitNet [23], is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

More Readings:

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

  • Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang
  • We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user’s perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.

LIMA: Less Is More for Alignment /

  • https://arxiv.org/abs/2305.11206
  • Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

Stable diffusion

  • URL
  • “High-Resolution Image Synthesis with Latent Diffusion Models”

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

  • URL
  • “personalization” of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. .”

LoRA: Low-Rank Adaptation of Large Language Models

  • URL
  • “propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.”

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

  • https://arxiv.org/abs/2208.01618
  • Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
  • Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks.
---

[53]: ehr

Table of readings


Presenter Papers Paper URL Our Slides
Bill Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) PDF PDF
Derrick Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) PDF PDF
Chao Reading Wikipedia to Answer Open-Domain Questions PDF PDF
Jennifer Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text PDF PDF

Presenter Papers Paper URL Our Slides
Jennifer Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Jennifer Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning PDF PDF
Jennifer Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers PDF PDF
Jennifer CleverHans PDF PDF
Ji Ji-f18-New papers about adversarial attack   PDF
---

[54]: em

Table of readings


Presenter Papers Paper URL Our Slides
Muthu Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal 1 PDF PDF
Muthu Fast Training of Recurrent Networks Based on EM Algorithm (1998) 2 PDF PDF
Muthu FitNets: Hints for Thin Deep Nets, ICLR15 3 PDF PDF
Muthu Two NIPS 2015 Deep Learning Optimization Papers PDF PDF
Muthu Difference Target Propagation (2015) 4 PDF PDF
---

[55]: embedding

Table of readings


Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  

Presenter Papers Paper URL Our Slides
Program Neural network-based graph embedding for cross-platform binary code similarity detection Pdf + Pdf Faizan PDF + GaoJi Pdf
Program Deep Program Reidentification: A Graph Neural Network Solution Pdf Weilin PDF
Program Heterogeneous Graph Neural Networks for Malicious Account Detection Pdf Weilin Pdf
Program Learning to represent programs with graphs Pdf 1  

Presenter Papers Paper URL Our Notes
Basics GraphSAGE: Large-scale Graph Representation Learning by Jure Leskovec Stanford University URL + PDF  
Basics Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering by Xavier Bresson URL + PDF Ryan Pdf
Basics Gated Graph Sequence Neural Networks by Microsoft Research URL + PDF Faizan Pdf
Basics DeepWalk - Turning Graphs into Features via Network Embeddings URL + PDF  
Basics Spectral Networks and Locally Connected Networks on Graphs 1 Pdf GaoJi slides + Bill Pdf
Basics A Comprehensive Survey on Graph Neural Networks/ Graph Neural Networks: A Review of Methods and Applications Pdf Jack Pdf
GCN Semi-Supervised Classification with Graph Convolutional Networks Pdf Jack Pdf

Presenter Papers Paper URL Our Slides
Derrick GloVe: Global Vectors for Word Representation PDF PDF
Derrick PARL.AI: A unified platform for sharing, training and evaluating dialog models across many tasks. URL PDF
Derrick scalable nearest neighbor algorithms for high dimensional data (PAMI14) 1 PDF PDF
Derrick StarSpace: Embed All The Things! PDF PDF
Derrick Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading, Martin Raison, Pierre-Emmanuel Mazaré, Rajarshi Das, Antoine Bordes PDF PDF

Presenter Papers Paper URL Our Slides
Bill Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 1 PDF PDF
Bill Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio PDF PDF
Bill Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang 2 PDF PDF
Bill On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick PDF PDF

Presenter Papers Paper URL Our Slides
QA Learning to rank with (a lot of) word features PDF  
Relation A semantic matching energy function for learning with multi-relational data PDF  
Relation Translating embeddings for modeling multi-relational data PDF  
QA Reading wikipedia to answer open-domain questions PDF  
QA Question answering with subgraph embeddings PDF  

Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  
---

[56]: encoder-decoder

Table of readings


Team INDEX Title & Link Tags Our Slide
T14 CAN: Creative Adversarial Networks Generating “Art” GAN OurSlide
T26 Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation encoder-decoder, dialog, VAE, Interpretable OurSlide
T32 Which Training Methods for GANs do actually Converge convergence, optimization, GAN OurSlide
---

[57]: expressive

Table of readings


Presenter Papers Paper URL Our Slides
SE Equivariance Through Parameter-Sharing, ICML17 1 PDF  
SE Why Deep Neural Networks for Function Approximation?, ICLR17 2 PDF  
SE Geometry of Neural Network Loss Surfaces via Random Matrix Theory, 3ICML17 PDF  
  Sharp Minima Can Generalize For Deep Nets, ICML17 4 PDF  

Presenter Papers Paper URL Our Slides
Ceyer A Closer Look at Memorization in Deep Networks, ICML17 1 PDF PDF
  On the Expressive Efficiency of Overlapping Architectures of Deep Learning 2 DLSSpdf + video  
Mutual Information Opening the Black Box of Deep Neural Networks via Information 3 URL + video  
ChaoJiang Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16 PDF PDF

Presenter Papers Paper URL Our Slides
Rita On the Expressive Power of Deep Neural Networks 1 PDF PDF
Arshdeep Understanding deep learning requires rethinking generalization, ICLR17 2 PDF PDF
Tianlu On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR17 3 PDF PDF
---

[58]: few-shot

Table of readings


Presenter Papers Paper URL Our Slides
GaoJi Neural Architecture Search with Reinforcement Learning, ICLR17 1 PDF PDF
Ceyer Learning to learn 2 DLSS17video PDF
Beilun Optimization as a Model for Few-Shot Learning, ICLR17 3 PDF + More PDF
Anant Neural Optimizer Search with Reinforcement Learning, ICML17 4 PDF PDF

Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[59]: forcing

Table of readings


Presenter Papers Paper URL Our Slides
Shijia Professor Forcing: A New Algorithm for Training Recurrent Networks, 1 NIPS16 PDF + Video PDF
Beilun+Arshdeep Mollifying Networks, Bengio, ICLR17 2 PDF PDF / PDF2
---

[60]: forgetting

Table of readings

---

[61]: fuzzing

Table of readings


Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  
---

[62]: gan

Table of readings


Team INDEX Title & Link Tags Our Slide
T14 CAN: Creative Adversarial Networks Generating “Art” GAN OurSlide
T26 Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation encoder-decoder, dialog, VAE, Interpretable OurSlide
T32 Which Training Methods for GANs do actually Converge convergence, optimization, GAN OurSlide

Presenter Papers Paper URL Our Slides
QA A Comparison of Current Graph Database Models Pdf + PDF2 Bill PDF
QA Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text Pdf Bill [PDF + GaoJi Pdf
QA Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan Pdf Bill PDF + GaoJi Pdf
QA Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix PDF + Pdf Bill PDF + GaoJi Pdf
Text Adversarial Text Generation via Feature-Mover’s Distance URL Faizan PDF
Text Content preserving text generation with attribute controls URL Faizan PDF
Text Multiple-Attribute Text Rewriting, ICLR, 2019, URL Faizan PDF
Text Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace URL Faizan PDF

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Tkach Boundary-Seeking Generative Adversarial Networks PDF PDF
Tkach Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF PDF
Tkach Generating Sentences from a Continuous Space PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Jack A Unified Approach to Interpreting Model Predictions PDF PDF
Jack “Why Should I Trust You?”: Explaining the Predictions of Any Classifier PDF PDF
Jack Visual Feature Attribution using Wasserstein GANs PDF PDF
Jack GAN Dissection: Visualizing and Understanding Generative Adversarial Networks PDF PDF
GaoJi Recent Interpretable machine learning papers PDF PDF
Jennifer The Building Blocks of Interpretability PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF

Presenter Papers Paper URL Our Slides
BrandonLiu Summary of Recent Generative Adversarial Networks (Classified)   PDF
Jack Generating and designing DNA with deep generative models, Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey PDF PDF
GaoJi More about basics of GAN   PDF
  McGan: Mean and Covariance Feature Matching GAN, PMLR 70:2527-2535 PDF  
  Wasserstein GAN, ICML17 PDF  
  Geometrical Insights for Implicit Generative Modeling, L Bottou, M Arjovsky, D Lopez-Paz, M Oquab PDF  

Presenter Papers Paper URL Our Slides
NIPS 2016 ganerative adversarial network tutorial (NIPS 2016) paper + video + code  
DLSS 2017 Generative Models I - DLSS 2017 slideraw + video + slide  

Presenter Papers Paper URL Our Slides
Tobin Energy-Based Generative Adversarial Network 1 PDF PDF
Jack Three Deep Generative Models PDF PDF
---

[63]: gcn

Table of readings


Index Papers Our Slides
1 Graph Convolutions: More than You Wanted to Know Derrick Survey
2 Spectral Graph Sparsification Derrick Survey
3 Complexity Analysis of Graph Convolutional Networks and in Attention based GNN Derrick Survey
4 PyTorch-BigGraph: A Large-Scale Graph Embedding System Derrick Survey
5 Scalable GNN Updates: More About PyTorch Geometric (PyG) Derrick Survey
6 Time and Space Complexity of Graph Convolutional Networks Derrick Survey
7 Large Scale GNN and Transformer Models and for Genomics Jack Survey
8 Long Range Attention and Visualizing BERT Jak Survey
9 Benchmarking Graph Neural Networks Sanchit Survey
---

[64]: gene-network

Table of readings


Index Papers Our Slides
1 Protein 3D Structure Computed from Evolutionary Sequence Variation Arsh Survey
3 Regulatory network inference on developmental and evolutionary lineages Arsh Survey
4 Deep learning in ultrasound image analysis Zhe Survey
5 Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (DeepBind) Jack Survey
6 Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors Jack Survey
7 BindSpace decodes transcription factor binding signals by large-scale sequence embedding Jack Survey
8 FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Jack Survey
9 Query-Reduction Networks for Question Answering Bill Survey
---

[65]: generalization

Table of readings


Index Papers Our Slides
1 Invariant Risk Minimization Zhe Survey
2 Causal Machine Learning Zhe Survey
3 A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms Zhe Survey
3 Review on Optimization-Based Meta Learning Zhe Survey
4 Domain adaptation and counterfactual prediction Zhe Survey
5 Gaussian Processes Zhe Survey
6 A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data Zhe Survey
7 Few-shot domain adaptation by causal mechanism transfer Zhe Survey

Index Papers Our Slides
1 Actor-Critic Methods for Control Jake Survey
2 Generalization in Deep Reinforcement Learning Jake Survey
3 Sample Efficient RL (Part 1) Jake Survey
4 Sample Efficient RL (Part 2) Jake Survey
5 Model-Free Value Methods in Deep RL Jake Survey
6 Investigating Human Priors for Playing Video Games Arsh Survey
---

Table of readings


Team INDEX Title & Link Tags Our Slide
T2 Empirical Study of Example Forgetting During Deep Neural Network Learning Sample Selection, forgetting OurSlide
T29 Select Via Proxy: Efficient Data Selection For Training Deep Networks Sample Selection OurSlide
T9 How SGD Selects the Global Minima in over-parameterized Learning optimization OurSlide
T10 Escaping Saddles with Stochastic Gradients optimization OurSlide
T13 To What Extent Do Different Neural Networks Learn the Same Representation subspace OurSlide
T19 On the Information Bottleneck Theory of Deep Learning informax OurSlide
T20 Visualizing the Loss Landscape of Neural Nets normalization OurSlide
T21 Using Pre-Training Can Improve Model Robustness and Uncertainty training, analysis OurSlide
T24 Norm matters: efficient and accurate normalization schemes in deep networks normalization OurSlide

Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF

Presenter Papers Paper URL Our Slides
BrandonLiu Summary of Recent Generative Adversarial Networks (Classified)   PDF
Jack Generating and designing DNA with deep generative models, Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey PDF PDF
GaoJi More about basics of GAN   PDF
  McGan: Mean and Covariance Feature Matching GAN, PMLR 70:2527-2535 PDF  
  Wasserstein GAN, ICML17 PDF  
  Geometrical Insights for Implicit Generative Modeling, L Bottou, M Arjovsky, D Lopez-Paz, M Oquab PDF  

Presenter Papers Paper URL Our Slides
Bill Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 1 PDF PDF
Bill Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio PDF PDF
Bill Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang 2 PDF PDF
Bill On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Generalization and Equilibrium in Generative Adversarial Nets (ICML17) 1 PDF + video PDF
Arshdeep Mode Regularized Generative Adversarial Networks (ICLR17) 2 PDF PDF
Bargav Improving Generative Adversarial Networks with Denoising Feature Matching, ICLR17 3 PDF PDF
Anant Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy, ICLR17 4 PDF + code PDF

Presenter Papers Paper URL Our Slides
SE Equivariance Through Parameter-Sharing, ICML17 1 PDF  
SE Why Deep Neural Networks for Function Approximation?, ICLR17 2 PDF  
SE Geometry of Neural Network Loss Surfaces via Random Matrix Theory, 3ICML17 PDF  
  Sharp Minima Can Generalize For Deep Nets, ICML17 4 PDF  

Presenter Papers Paper URL Our Slides
Rita On the Expressive Power of Deep Neural Networks 1 PDF PDF
Arshdeep Understanding deep learning requires rethinking generalization, ICLR17 2 PDF PDF
Tianlu On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR17 3 PDF PDF
---

[66]: generative

Table of readings


Index Papers Our Slides
1 Beta VAE, Ladder VAE, Causal VAE Arsh Survey
2 Learnt Prior VAE Arsh Survey
3 Multitask Graph Autoencoder Arsh Survey
4 Introduction to component analysi Zhe Survey
5 Normalizing flow Zhe Survey
6 Nonlinear ICA Zhe Survey
7 Deep Convolutional Inverse Graphics Network Zhe Survey
---

Table of readings


Presenter Papers Paper URL Our Slides
QA A Comparison of Current Graph Database Models Pdf + PDF2 Bill PDF
QA Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text Pdf Bill [PDF + GaoJi Pdf
QA Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan Pdf Bill PDF + GaoJi Pdf
QA Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix PDF + Pdf Bill PDF + GaoJi Pdf
Text Adversarial Text Generation via Feature-Mover’s Distance URL Faizan PDF
Text Content preserving text generation with attribute controls URL Faizan PDF
Text Multiple-Attribute Text Rewriting, ICLR, 2019, URL Faizan PDF
Text Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace URL Faizan PDF

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Tkach Boundary-Seeking Generative Adversarial Networks PDF PDF
Tkach Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF PDF
Tkach Generating Sentences from a Continuous Space PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 PDF PDF
Arshdeep The CRISPR tool kit for genome editing and beyond, Mazhar Adli PDF PDF
Eric Intro of Genetic Engineering PDF PDF
Eric Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs PDF PDF
Brandon Generative Modeling for Protein Structure URL PDF

Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. PDF PDF
Arshdeep Solving the RNA design problem with reinforcement learning, PLOSCB 1 PDF PDF
Arshdeep Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk 2 PDF PDF
Arshdeep Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio 3 PDF PDF
Brandon Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics PDF PDF
Arshdeep Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions 2 PDF PDF

Presenter Papers Paper URL Our Slides
Bill Intriguing Properties of Adversarial Examples, Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le 1 PDF PDF
Bill Adversarial Spheres 2 PDF PDF
Bill Adversarial Transformation Networks: Learning to Generate Adversarial Examples, Shumeet Baluja, Ian Fischer 3 PDF PDF
Bill Thermometer encoding: one hot way to resist adversarial examples 4 PDF PDF
  Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow 5 PDF  

Presenter Papers Paper URL Our Slides
BrandonLiu Summary of Recent Generative Adversarial Networks (Classified)   PDF
Jack Generating and designing DNA with deep generative models, Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey PDF PDF
GaoJi More about basics of GAN   PDF
  McGan: Mean and Covariance Feature Matching GAN, PMLR 70:2527-2535 PDF  
  Wasserstein GAN, ICML17 PDF  
  Geometrical Insights for Implicit Generative Modeling, L Bottou, M Arjovsky, D Lopez-Paz, M Oquab PDF  

Presenter Papers Paper URL Our Slides
Bill Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 1 PDF PDF
Bill Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio PDF PDF
Bill Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang 2 PDF PDF
Bill On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Generalization and Equilibrium in Generative Adversarial Nets (ICML17) 1 PDF + video PDF
Arshdeep Mode Regularized Generative Adversarial Networks (ICLR17) 2 PDF PDF
Bargav Improving Generative Adversarial Networks with Denoising Feature Matching, ICLR17 3 PDF PDF
Anant Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy, ICLR17 4 PDF + code PDF

Presenter Papers Paper URL Our Slides
ChaoJiang Courville - Generative Models II DLSS17Slide + video PDF
GaoJi Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 1 PDF + talk PDF
Arshdeep Composing graphical models with neural networks for structured representations and fast inference, NIPS16 2 PDF PDF
  Johnson - Graphical Models and Deep Learning DLSSSlide + video  
  Parallel Multiscale Autoregressive Density Estimation, ICML17 3 PDF  
Beilun Conditional Image Generation with Pixel CNN Decoders, NIPS16 4 PDF PDF
Shijia Marrying Graphical Models & Deep Learning DLSS17 + Video PDF

Presenter Papers Paper URL Our Slides
Jack Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 1 PDF PDF
Arshdeep Bidirectional Attention Flow for Machine Comprehension, ICLR17 2 PDF + code PDF
Ceyer Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17 PDF + code PDF
ChaoJiang Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 3 PDF PDF
  An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17 PDF  

Presenter Papers Paper URL Our Slides
NIPS 2016 ganerative adversarial network tutorial (NIPS 2016) paper + video + code  
DLSS 2017 Generative Models I - DLSS 2017 slideraw + video + slide  

Presenter Papers Paper URL Our Slides
Tobin Energy-Based Generative Adversarial Network 1 PDF PDF
Jack Three Deep Generative Models PDF PDF
---

[67]: genomics

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 PDF PDF
Arshdeep The CRISPR tool kit for genome editing and beyond, Mazhar Adli PDF PDF
Eric Intro of Genetic Engineering PDF PDF
Eric Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs PDF PDF
Brandon Generative Modeling for Protein Structure URL PDF

Presenter Papers Paper URL Our Slides
Arshdeep DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. PDF PDF
Arshdeep Solving the RNA design problem with reinforcement learning, PLOSCB 1 PDF PDF
Arshdeep Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk 2 PDF PDF
Arshdeep Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio 3 PDF PDF
Brandon Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics PDF PDF
Arshdeep Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions 2 PDF PDF
---

[68]: geometric

Table of readings


Presenter Papers Paper URL Our Slides
spherical Spherical CNNs Pdf Fuwen PDF + Arshdeep Pdf
dynamic Dynamic graph cnn for learning on point clouds, 2018 Pdf Fuwen PDF
basics Geometric Deep Learning (simple introduction video) URL  
matching All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks Pdf Fuwen PDF
completion Geometric matrix completion with recurrent multi-graph neural networks Pdf Fuwen PDF
Tutorial Geometric Deep Learning on Graphs and Manifolds URL Arsh PDF
matching Similarity Learning with Higher-Order Proximity for Brain Network Analysis   Arsh PDF
pairwise Pixel to Graph with Associative Embedding PDF Fuwen PDF
3D 3D steerable cnns: Learning rotationally equivariant features in volumetric data URL Fuwen PDF

Presenter Papers Paper URL Our Slides
Bio KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 1 Pdf Eli Pdf
Bio Molecular geometry prediction using a deep generative graph neural network Pdf Eli Pdf
Bio Visualizing convolutional neural network protein-ligand scoring PDF() Eli PDF
Bio Deep generative models of genetic variation capture mutation effects PDF() Eli PDF
Bio Attentive cross-modal paratope prediction Pdf Eli PDF
---

[69]: graph

Table of readings


Index Papers Our Slides
1 A Flexible Generative Framework for Graph-based Semi-supervised Learning Arsh Survey
2 Learning Discrete Structures for Graph Neural Networks Arsh Survey
4 Graph Markov Neural Nets Arsh Survey
  Graph Markov Neural Networks Jack Survey
5 GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations Arsh Survey
6 Subgraph Neural Networks Arsh Survey
7 Pointer Graph Networks Arsh Survey
8 Modeling Relational Data with Graph Convolutional Networks Arsh Survey
9 Graph Learning Zhe Survey
8 Neural Relational Inference Zhe Survey
---

Table of readings


Presenter Papers Paper URL Our Slides
QA A Comparison of Current Graph Database Models Pdf + PDF2 Bill PDF
QA Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text Pdf Bill [PDF + GaoJi Pdf
QA Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan Pdf Bill PDF + GaoJi Pdf
QA Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix PDF + Pdf Bill PDF + GaoJi Pdf
Text Adversarial Text Generation via Feature-Mover’s Distance URL Faizan PDF
Text Content preserving text generation with attribute controls URL Faizan PDF
Text Multiple-Attribute Text Rewriting, ICLR, 2019, URL Faizan PDF
Text Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace URL Faizan PDF

Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Robust Adversarial Attacks on Graph Structured Data Pdf Faizan [PDF + GaoJi Pdf
Robust KDD’18 Adversarial Attacks on Neural Networks for Graph Data Pdf Faizan PDF + GaoJi Pdf
Robust Attacking Binarized Neural Networks Pdf Faizan PDF

Presenter Papers Paper URL Our Slides
spherical Spherical CNNs Pdf Fuwen PDF + Arshdeep Pdf
dynamic Dynamic graph cnn for learning on point clouds, 2018 Pdf Fuwen PDF
basics Geometric Deep Learning (simple introduction video) URL  
matching All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks Pdf Fuwen PDF
completion Geometric matrix completion with recurrent multi-graph neural networks Pdf Fuwen PDF
Tutorial Geometric Deep Learning on Graphs and Manifolds URL Arsh PDF
matching Similarity Learning with Higher-Order Proximity for Brain Network Analysis   Arsh PDF
pairwise Pixel to Graph with Associative Embedding PDF Fuwen PDF
3D 3D steerable cnns: Learning rotationally equivariant features in volumetric data URL Fuwen PDF

Presenter Papers Paper URL Our Slides
Matching Deep Learning of Graph Matching, PDF+ PDF Jack Pdf
Matching Graph Edit Distance Computation via Graph Neural Networks PDF Jack Pdf
Basics Link Prediction Based on Graph Neural Networks Pdf Jack Pdf
Basics Supervised Community Detection with Line Graph Neural Networks Pdf Jack Pdf
Basics Graph mining: Laws, generators, and algorithms Pdf Arshdeep PDF
pooling Hierarchical graph representation learning with differentiable pooling PDF Eamon PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Bill Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) PDF PDF
Derrick Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) PDF PDF
Chao Reading Wikipedia to Answer Open-Domain Questions PDF PDF
Jennifer Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text PDF PDF

Presenter Papers Paper URL Our Slides
Eric Modeling polypharmacy side effects with graph convolutional networks PDF PDF
Eric Protein Interface Prediction using Graph Convolutional Networks PDF PDF
Eric Structure biology meets data science: does anything change URL PDF
Eric DeepSite: protein-binding site predictor using 3D-convolutional neural networks URL PDF

Presenter Papers Paper URL Our Slides
QA Learning to rank with (a lot of) word features PDF  
Relation A semantic matching energy function for learning with multi-relational data PDF  
Relation Translating embeddings for modeling multi-relational data PDF  
QA Reading wikipedia to answer open-domain questions PDF  
QA Question answering with subgraph embeddings PDF  
---

[70]: graph-attention

Table of readings


Index Papers Our Slides
1 Graph Convolutions: More than You Wanted to Know Derrick Survey
2 Spectral Graph Sparsification Derrick Survey
3 Complexity Analysis of Graph Convolutional Networks and in Attention based GNN Derrick Survey
4 PyTorch-BigGraph: A Large-Scale Graph Embedding System Derrick Survey
5 Scalable GNN Updates: More About PyTorch Geometric (PyG) Derrick Survey
6 Time and Space Complexity of Graph Convolutional Networks Derrick Survey
7 Large Scale GNN and Transformer Models and for Genomics Jack Survey
8 Long Range Attention and Visualizing BERT Jak Survey
9 Benchmarking Graph Neural Networks Sanchit Survey
---

[71]: graphical-model

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
ChaoJiang Courville - Generative Models II DLSS17Slide + video PDF
GaoJi Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 1 PDF + talk PDF
Arshdeep Composing graphical models with neural networks for structured representations and fast inference, NIPS16 2 PDF PDF
  Johnson - Graphical Models and Deep Learning DLSSSlide + video  
  Parallel Multiscale Autoregressive Density Estimation, ICML17 3 PDF  
Beilun Conditional Image Generation with Pixel CNN Decoders, NIPS16 4 PDF PDF
Shijia Marrying Graphical Models & Deep Learning DLSS17 + Video PDF
---

[72]: hallucination

Table of readings


In this session, our readings cover:

Required Readings:

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

  • https://arxiv.org/abs/2311.05232
  • The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of LLMs in real-world scenarios, which attracts increasing attention to detect and mitigate these hallucinations. In this survey, we aim to provide a thorough and in-depth overview of recent advances in the field of LLM hallucinations. We begin with an innovative taxonomy of LLM hallucinations, then delve into the factors contributing to hallucinations. Subsequently, we present a comprehensive overview of hallucination detection methods and benchmarks. Additionally, representative approaches designed to mitigate hallucinations are introduced accordingly. Finally, we analyze the challenges that highlight the current limitations and formulate open questions, aiming to delineate pathways for future research on hallucinations in LLMs.

More Readings:

LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

  • https://arxiv.org/abs/2305.14540
  • With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency detection compared to traditional non-LLM methods. However, a closer analysis reveals that most LLMs fail on more complex formulations of the task and exposes issues with existing evaluation benchmarks, affecting evaluation precision. To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits. This new benchmark is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, as we estimate inter-annotator agreement at about 0.9. Most LLMs struggle on SummEdits, with performance close to random chance. The best-performing model, GPT-4, is still 8\% below estimated human performance, highlighting the gaps in LLMs’ ability to reason about facts and detect inconsistencies when they occur.

Survey of Hallucination in Natural Language Generation

  • https://arxiv.org/abs/2202.03629
  • Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Ho Shu Chan, Wenliang Dai, Andrea Madotto, Pascale Fung
  • Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation; and (3) hallucinations in large language models (LLMs). This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.

Do Language Models Know When They’re Hallucinating References?

  • https://arxiv.org/abs/2305.18248

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment

  • https://arxiv.org/abs/2308.05374
---

[73]: hash

Table of readings


Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[74]: heterogeneous

Table of readings


Presenter Papers Paper URL Our Slides
Program Neural network-based graph embedding for cross-platform binary code similarity detection Pdf + Pdf Faizan PDF + GaoJi Pdf
Program Deep Program Reidentification: A Graph Neural Network Solution Pdf Weilin PDF
Program Heterogeneous Graph Neural Networks for Malicious Account Detection Pdf Weilin Pdf
Program Learning to represent programs with graphs Pdf 1  
---

[75]: hierarchical

Table of readings


Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  

Presenter Papers Paper URL Our Slides
Ceyer Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 1 PDF PDF
Beilun Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy 2 PDF PDF
Ji Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 3 PDF PDF
Xueying End-to-End Differentiable Adversarial Imitation Learning, ICML17 4 PDF PDF
  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17 PDF  
  FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 5 PDF  

Presenter Papers Paper URL Our Slides
Jack Learning to Query, Reason, and Answer Questions On Ambiguous Texts, ICLR17 1 PDF PDF
Arshdeep Making Neural Programming Architectures Generalize via Recursion, ICLR17 2 PDF PDF
Xueying Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music, ICLR17 3 PDF PDF
---

[76]: high-dimensional

Table of readings


Presenter Papers Paper URL Our Slides
GaoJi Delving into Transferable Adversarial Examples and Black-box Attacks,ICLR17 1 pdf PDF
Shijia On Detecting Adversarial Perturbations, ICLR17 2 pdf PDF
Anant Parseval Networks: Improving Robustness to Adversarial Examples, ICML17 3 pdf PDF
Bargav Being Robust (in High Dimensions) Can Be Practical, ICML17 4 pdf PDF
---

[77]: human-alignment

Table of readings


Papers Paper URL Abstract
Training language models to follow instructions with human feedback URL “further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences URL “explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”
---

[78]: hyperparameter

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep Learning Transferable Architectures for Scalable Image Recognition PDF PDF
Arshdeep FractalNet: Ultra-Deep Neural Networks without Residuals PDF PDF

Presenter Papers Paper URL Our Slides
GaoJi Forward and Reverse Gradient-Based Hyperparameter Optimization, ICML17 1 PDF PDF
Chaojiang Adaptive Neural Networks for Efficient Inference, ICML17 2 PDF PDF
Bargav Practical Gauss-Newton Optimisation for Deep Learning, ICML17 3 PDF PDF
Rita How to Escape Saddle Points Efficiently, ICML17 4 PDF PDF
  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning PDF  
---

[79]: image-synthesis

Table of readings


Stable diffusion

  • URL
  • “High-Resolution Image Synthesis with Latent Diffusion Models”

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

  • URL
  • “personalization” of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. .”

LoRA: Low-Rank Adaptation of Large Language Models

  • URL
  • “propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.”

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

  • https://arxiv.org/abs/2208.01618
  • Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
  • Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks.
---

[80]: imitation-learning

Table of readings


Presenter Papers Paper URL Our Slides
Ceyer Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 1 PDF PDF
Beilun Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy 2 PDF PDF
Ji Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 3 PDF PDF
Xueying End-to-End Differentiable Adversarial Imitation Learning, ICML17 4 PDF PDF
  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17 PDF  
  FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 5 PDF  
---

[81]: imputation

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF
---

[82]: influence-functions

Table of readings


Presenter Papers Paper URL Our Slides
GaoJi A few useful things to know about machine learning PDF PDF
GaoJi A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions PDF PDF
GaoJi Automated White-box Testing of Deep Learning Systems 1 PDF PDF
GaoJi Testing and Validating Machine Learning Classifiers by Metamorphic Testing 2 PDF PDF
GaoJi Software testing: a research travelogue (2000–2014) PDF PDF
---

[83]: infomax

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep Relational inductive biases, deep learning, and graph networks PDF PDF
Arshdeep Discriminative Embeddings of Latent Variable Models for Structured Data PDF PDF
Jack Deep Graph Infomax PDF PDF

Presenter Papers Paper URL Our Slides
Ceyer A Closer Look at Memorization in Deep Networks, ICML17 1 PDF PDF
  On the Expressive Efficiency of Overlapping Architectures of Deep Learning 2 DLSSpdf + video  
Mutual Information Opening the Black Box of Deep Neural Networks via Information 3 URL + video  
ChaoJiang Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16 PDF PDF
---

Table of readings


Presenter Papers Paper URL Our Slides
Jack Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 1 PDF PDF
Arshdeep Bidirectional Attention Flow for Machine Comprehension, ICLR17 2 PDF + code PDF
Ceyer Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17 PDF + code PDF
ChaoJiang Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 3 PDF PDF
  An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17 PDF  
---

[84]: informax

Table of readings

---

[85]: interpretable

Table of readings


Index Papers Our Slides
0 A survey on Interpreting Deep Learning Models Eli Survey
  Interpretable Machine Learning: Definitions,Methods, Applications Arsh Survey
1 Explaining Explanations: Axiomatic Feature Interactions for Deep Networks Arsh Survey
2 Shapley Value review Arsh Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Bill Survey
  Consistent Individualized Feature Attribution for Tree Ensembles bill Survey
  Summary for A value for n-person games Pan Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Rishab Survey
3 Hierarchical Interpretations of Neural Network Predictions Arsh Survey
  Hierarchical Interpretations of Neural Network Predictions Rishab Survey
4 Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Arsh Survey
  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Rishab Survey
5 Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models Rishab Survey
    Sanchit Survey
  Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Sanchit Survey
6 This Looks Like That: Deep Learning for Interpretable Image Recognition Pan Survey
7 AllenNLP Interpret Rishab Survey
8 DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs Rishab Survey
9 How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations Rishab Survey
10 Attention is not Explanation Sanchit Survey
    Pan Survey
11 Axiomatic Attribution for Deep Networks Sanchit Survey
12 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Sanchit Survey
13 Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier Sanchit Survey
14 “Why Should I Trust You?”Explaining the Predictions of Any Classifier Yu Survey
15 INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE Pan Survey

Presenter Papers Paper URL Our Slides
Understand Faithful and Customizable Explanations of Black Box Models Pdf Derrick PDF
Understand A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 Pdf GaoJi PDF + Bill Pdf
Understand How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Pdf + Pdf GaoJi PDF
Understand Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks Pdf + PDF GaoJi PDF
Understand Attention is not Explanation, 2019 PDF  
Understand Understanding attention in graph neural networks, 2019 PDF  

Presenter Papers Paper URL Our Slides
Jennifer Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Jennifer Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning PDF PDF
Jennifer Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers PDF PDF
Jennifer CleverHans PDF PDF
Ji Ji-f18-New papers about adversarial attack   PDF

Presenter Papers Paper URL Our Slides
Bill Adversarial Examples that Fool both Computer Vision and Time-Limited Humans PDF PDF
Bill Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Bill TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing PDF PDF
Bill Distilling the Knowledge in a Neural Network PDF PDF
Bill Defensive Distillation is Not Robust to Adversarial Examples PDF PDF
Bill Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow PDF PDF

Presenter Papers Paper URL Our Slides
Bill Intriguing Properties of Adversarial Examples, Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le 1 PDF PDF
Bill Adversarial Spheres 2 PDF PDF
Bill Adversarial Transformation Networks: Learning to Generate Adversarial Examples, Shumeet Baluja, Ian Fischer 3 PDF PDF
Bill Thermometer encoding: one hot way to resist adversarial examples 4 PDF PDF
  Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow 5 PDF  

Presenter Papers Paper URL Our Slides
Rita Learning Important Features Through Propagating Activation Differences, ICML17 1 PDF PDF
GaoJi Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16 2 PDF PDF
Rita Learning Kernels with Random Features, Aman Sinha*; John Duchi, 3 PDF PDF

Presenter Papers Paper URL Our Slides
Shijia Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 1 PDF PDF
Ceyer Sequence Modeling via Segmentations, ICML17 2 PDF PDF
Arshdeep Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
AE Intriguing properties of neural networks / PDF  
AE Explaining and Harnessing Adversarial Examples PDF  
AE Towards Deep Learning Models Resistant to Adversarial Attacks PDF  
AE DeepFool: a simple and accurate method to fool deep neural networks PDF  
AE Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner PDF PDF
Data Basic Survey of ImageNet - LSVRC competition URL PDF
Understand Understanding Black-box Predictions via Influence Functions PDF  
Understand Deep inside convolutional networks: Visualising image classification models and saliency maps PDF  
Understand BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1] PDF  
provable Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter, URL  
---

Table of readings


Presenter Papers Paper URL Our Slides
Jack A Unified Approach to Interpreting Model Predictions PDF PDF
Jack “Why Should I Trust You?”: Explaining the Predictions of Any Classifier PDF PDF
Jack Visual Feature Attribution using Wasserstein GANs PDF PDF
Jack GAN Dissection: Visualizing and Understanding Generative Adversarial Networks PDF PDF
GaoJi Recent Interpretable machine learning papers PDF PDF
Jennifer The Building Blocks of Interpretability PDF PDF
---

[86]: interpretibility

Table of readings


Required Readings:

Rethinking interpretability in the era of large language models

  • Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao
  • 2024/1/30
  • Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations.

The Claude 3 Model Family: Opus, Sonnet, Haiku

  • https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf
  • We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model. All new models have vision capabilities that enable them to process and analyze image data. The Claude 3 family demonstrates strong performance across benchmark evaluations and sets a new standard on measures of reasoning, math, and coding. Claude 3 Opus achieves state-of-the-art results on evaluations like GPQA [1], MMLU [2], MMMU [3] and many more. Claude 3 Haiku performs as well or better than Claude 2 [4] on most pure-text tasks, while Sonnet and Opus significantly outperform it. Additionally, these models exhibit improved fluency in non-English languages, making them more versatile for a global audience. In this report, we provide an in-depth analysis of our evaluations, focusing on core capabilities, safety, societal impacts, and the catastrophic risk assessments we committed to in our Responsible Scaling Policy [5].

More Readings:

Knowledge Conflicts for LLMs: A Survey

  • https://arxiv.org/abs/2403.08319
  • This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness

Transformer Debugger

  • https://github.com/openai/transformer-debugger
  • Transformer Debugger (TDB) is a tool developed by OpenAI’s Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders. TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, “Why does the model output token A instead of token B for this prompt?” or “Why does attention head H attend to token T for this prompt?” It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits.

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

  • https://transformer-circuits.pub/2023/monosemantic-features/index.html
  • In this paper, we use a weak dictionary learning algorithm called a sparse autoencoder to generate learned features from a trained model that offer a more monosemantic unit of analysis than the model’s neurons themselves. Our approach here builds on a significant amount of prior work, especially in using dictionary learning and related methods on neural network activations , and a more general allied literature on disentanglement. We also note interim reports which independently investigated the sparse autoencoder approach in response to Toy Models, culminating in the recent manuscript of Cunningham et al.
  • related post: Decomposing Language Models Into Understandable Components https://www.anthropic.com/news/decomposing-language-models-into-understandable-components

Tracing Model Outputs to the Training Data

  • https://www.anthropic.com/news/influence-functions
  • As large language models become more powerful and their risks become clearer, there is increasing value to figuring out what makes them tick. In our previous work, we have found that large language models change along many personality and behavioral dimensions as a function of both scale and the amount of fine-tuning. Understanding these changes requires seeing how models work, for instance to determine if a model’s outputs rely on memorization or more sophisticated processing. Understanding the inner workings of language models will have substantial implications for forecasting AI capabilities as well as for approaches to aligning AI systems with human preferences. Mechanistic interpretability takes a bottom-up approach to understanding ML models: understanding in detail the behavior of individual units or small-scale circuits such as induction heads. But we also see value in a top-down approach, starting with a model’s observable behaviors and generalization patterns and digging down to see what neurons and circuits are responsible. An advantage of working top-down is that we can directly study high-level cognitive phenomena of interest which only arise at a large scale, such as reasoning and role-playing. Eventually, the two approaches should meet in the middle.

Language models can explain neurons in language models

  • https://openai.com/research/language-models-can-explain-neurons-in-language-models
  • Language models have become more capable and more widely deployed, but we do not understand how they work. Recent work has made progress on understanding a small number of circuits and narrow behaviors,[1][2] but to fully understand a language model, we’ll need to analyze millions of neurons. This paper applies automation to the problem of scaling an interpretability technique to all the neurons in a large language model. Our hope is that building on this approach of automating interpretability [3][4][5] will enable us to comprehensively audit the safety of models before deployment.
---

[87]: invariant

Table of readings


Presenter Papers Paper URL Our Slides
spherical Spherical CNNs Pdf Fuwen PDF + Arshdeep Pdf
dynamic Dynamic graph cnn for learning on point clouds, 2018 Pdf Fuwen PDF
basics Geometric Deep Learning (simple introduction video) URL  
matching All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks Pdf Fuwen PDF
completion Geometric matrix completion with recurrent multi-graph neural networks Pdf Fuwen PDF
Tutorial Geometric Deep Learning on Graphs and Manifolds URL Arsh PDF
matching Similarity Learning with Higher-Order Proximity for Brain Network Analysis   Arsh PDF
pairwise Pixel to Graph with Associative Embedding PDF Fuwen PDF
3D 3D steerable cnns: Learning rotationally equivariant features in volumetric data URL Fuwen PDF

Presenter Papers Paper URL Our Notes
Basics GraphSAGE: Large-scale Graph Representation Learning by Jure Leskovec Stanford University URL + PDF  
Basics Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering by Xavier Bresson URL + PDF Ryan Pdf
Basics Gated Graph Sequence Neural Networks by Microsoft Research URL + PDF Faizan Pdf
Basics DeepWalk - Turning Graphs into Features via Network Embeddings URL + PDF  
Basics Spectral Networks and Locally Connected Networks on Graphs 1 Pdf GaoJi slides + Bill Pdf
Basics A Comprehensive Survey on Graph Neural Networks/ Graph Neural Networks: A Review of Methods and Applications Pdf Jack Pdf
GCN Semi-Supervised Classification with Graph Convolutional Networks Pdf Jack Pdf

Presenter Papers Paper URL Our Slides
DeepBind Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning PDF  
DeepSEA Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk PDF  
DeepSEA Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014    
BioBasics A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13    
BioBasics Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011    
BioBasics Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004    
BioBasics NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels    
MedSignal Segmenting Time Series: A Survey and Novel Approach, PDF  
---

[88]: knowledge-graph

Table of readings


Presenter Papers Paper URL Our Slides
Understand Faithful and Customizable Explanations of Black Box Models Pdf Derrick PDF
Understand A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 Pdf GaoJi PDF + Bill Pdf
Understand How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Pdf + Pdf GaoJi PDF
Understand Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks Pdf + PDF GaoJi PDF
Understand Attention is not Explanation, 2019 PDF  
Understand Understanding attention in graph neural networks, 2019 PDF  

Presenter Papers Paper URL Our Slides
QA A Comparison of Current Graph Database Models Pdf + PDF2 Bill PDF
QA Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text Pdf Bill [PDF + GaoJi Pdf
QA Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan Pdf Bill PDF + GaoJi Pdf
QA Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix PDF + Pdf Bill PDF + GaoJi Pdf
Text Adversarial Text Generation via Feature-Mover’s Distance URL Faizan PDF
Text Content preserving text generation with attribute controls URL Faizan PDF
Text Multiple-Attribute Text Rewriting, ICLR, 2019, URL Faizan PDF
Text Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace URL Faizan PDF
---

[89]: language-model

Table of readings


Papers Paper URL Abstract
Training language models to follow instructions with human feedback URL “further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences URL “explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”

Emergent Abilities of Large Language Models

  • URL
  • “an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”

Language Models are Few-Shot Learners

  • URL
  • “GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”

On the Opportunities and Risks of Foundation Models

  • URL
  • ” a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations).”

The Power of Scale for Parameter-Efficient Prompt Tuning

  • https://arxiv.org/abs/2104.08691
  • Brian Lester, Rami Al-Rfou, Noah Constant
  • In this work, we explore “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s “few-shot” learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method “closes the gap” and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed “prefix tuning” of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

Papers Paper URL Abstract
Evolutionary-scale prediction of atomic level protein structure with a language model URL “show that direct inference of structure from primary sequence using a large language model enables an order of magnitude speed-up in high resolution structure prediction. Leveraging the insight that language models learn evolutionary patterns across millions of sequences, we train models up to 15B parameters,…”
DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking URL “Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space.”
---

[90]: language-processing

Table of readings


Index Papers Our Slides
1 Protein 3D Structure Computed from Evolutionary Sequence Variation Arsh Survey
3 Regulatory network inference on developmental and evolutionary lineages Arsh Survey
4 Deep learning in ultrasound image analysis Zhe Survey
5 Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (DeepBind) Jack Survey
6 Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors Jack Survey
7 BindSpace decodes transcription factor binding signals by large-scale sequence embedding Jack Survey
8 FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Jack Survey
9 Query-Reduction Networks for Question Answering Bill Survey
---

[91]: learn2learn

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction 1 PDF PDF
Arshdeep Decoupled Neural Interfaces Using Synthetic Gradients 2 PDF PDF
Arshdeep Diet Networks: Thin Parameters for Fat Genomics 3 PDF PDF
Arshdeep Metric Learning with Adaptive Density Discrimination 4 PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep HyperNetworks, David Ha, Andrew Dai, Quoc V. Le ICLR 2017 1 PDF PDF
Arshdeep Learning feed-forward one-shot learners 2 PDF PDF
Arshdeep Learning to Learn by gradient descent by gradient descent 3 PDF PDF
Arshdeep Dynamic Filter Networks 4 https://arxiv.org/abs/1605.09673 PDF PDF
---

[92]: llmevaluate

Table of readings


In this session, our readings cover:

Required Readings:

Are Large Pre-Trained Language Models Leaking Your Personal Information?

  • https://arxiv.org/abs/2205.12628
  • Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner’s name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.

Privacy Risks of General-Purpose Language Models

  • https://ieeexplore.ieee.org/abstract/document/9152761
  • We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, different…

More Readings:

Privacy in Large Language Models: Attacks, Defenses and Future Directions

  • https://arxiv.org/abs/2310.10383
  • The advancement of large language models (LLMs) has significantly enhanced the ability to effectively tackle various downstream NLP tasks and unify these tasks into generative pipelines. On the one hand, powerful language models, trained on massive textual data, have brought unparalleled accessibility and usability for both models and users. On the other hand, unrestricted access to these models can also introduce potential malicious and unintentional privacy risks. Despite ongoing efforts to address the safety and privacy concerns associated with LLMs, the problem remains unresolved. In this paper, we provide a comprehensive analysis of the current privacy attacks targeting LLMs and categorize them according to the adversary’s assumed capabilities to shed light on the potential vulnerabilities present in LLMs. Then, we present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks. Beyond existing works, we identify upcoming privacy concerns as LLMs evolve. Lastly, we point out several potential avenues for future exploration.

ProPILE: Probing Privacy Leakage in Large Language Models

  • https://arxiv.org/abs/2307.01881
  • Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.

In this session, our readings cover:

Required Readings:

Foundation Models and Fair Use

  • Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, Percy Liang
  • URL
  • Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.

Extracting Training Data from Diffusion Models

  • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
  • Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

  • https://arxiv.org/abs/2303.04226
  • Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.

More Readings:

Audio Deepfake Detection: A Survey

  • https://arxiv.org/abs/2308.14970
  • Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results.
  • https://openreview.net/forum?id=pSf8rrn49H
  • The images generated by text-to-image models could be accused of the copyright infringement, which has aroused heated debate among AI developers, content creators, legislation department and judicature department. Especially, the state-of-the-art text-to-image models are capable of generating extremely high-quality works while at the same time lack the ability to attribute credits to the original creators, which brings anxiety to the artists’ community. In this paper, we propose a conceptual framework – copyright Plug-in Market – to address the tension between the users, the content creators and the generative models. We introduce three operations in the \copyright Plug-in Market: addition, extraction and combination to facilitate proper credit attribution in the text-to-image procedure and enable the digital copyright protection. For the addition operation, we train a \copyright plug-in for a specific copyrighted concept and add it to the generative model and then we are able to generate new images with the copyrighted concept, which abstract existing solutions of portable LoRAs. We further introduce the extraction operation to enable content creators to claim copyrighted concept from infringing generative models and the combination operation to enable users to combine different \copyright plug-ins to generate images with multiple copyrighted concepts. We believe these basic operations give good incentives to each participant in the market, and enable enough flexibility to thrive the market. Technically, we innovate an inverse LoRA’’ approach to instantiate the extraction operation and propose a data-ignorant layer-wise distillation’’ approach to combine the multiple extractions or additions easily. To showcase the diverse capabilities of copyright plug-ins, we conducted experiments in two domains: style transfer and cartoon IP recreation. The results demonstrate that copyright plug-ins can effectively accomplish copyright extraction and combination, providing a valuable copyright protection solution for the era of generative AIs.

Membership Inference Attacks against Language Models via Neighbourhood Comparison

https://aclanthology.org/2023.findings-acl.719/

Deepfake Taylor Swift event:

  • https://www.cbsnews.com/news/taylor-swift-artificial-intellignence-ai-4chan/

In this session, our readings cover:

Required Readings:

TrustLLM: Trustworthiness in Large Language Models

  • https://arxiv.org/abs/2401.05561
  • Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

  • Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into “The Good” (beneficial LLM applications), “The Bad” (offensive applications), and “The Ugly” (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs’ potential to both bolster and jeopardize cybersecurity
  • https://arxiv.org/abs/2312.02003

More Readings:

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

  • https://arxiv.org/abs/2212.14834
  • Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs, a subfield of trustworthy ML, combining the perspectives of Natural Language Processing and Security. Prior work has shown that even safety-aligned LLMs (via instruction tuning and reinforcement learning through human feedback) can be susceptible to adversarial attacks, which exploit weaknesses and mislead AI systems, as evidenced by the prevalence of `jailbreak’ attacks on models like ChatGPT a

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

  • https://arxiv.org/abs/2311.16119
  • Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.

Even More:

ACL 2024 Tutorial: Vulnerabilities of Large Language Models to Adversarial Attacks

  • https://llm-vulnerability.github.io/

Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration

  • https://www.tandfonline.com/doi/full/10.1080/15228053.2023.2233814

  • https://huggingface.co/blog?tag=ethics

    • https://huggingface.co/blog/ethics-diffusers
    • https://huggingface.co/blog/model-cards
    • https://huggingface.co/blog/us-national-ai-research-resource

NIST AI RISK MANAGEMENT FRAMEWORK

  • https://www.nist.gov/itl/ai-risk-management-framework
  • https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
  • https://airc.nist.gov/AI_RMF_Knowledge_Base/Roadmap
  • EU AI Act / GDPR

In this session, our readings cover:

Required Readings:

Holistic Evaluation of Text-To-Image Models

  • https://arxiv.org/abs/2311.04287
  • The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at this https URL and the code at this https URL, which is integrated with the HELM codebase.

Holistic Evaluation of Language Models

  • https://arxiv.org/abs/2211.09110

More Readings:

Challenges in evaluating AI systems

  • https://www.anthropic.com/news/evaluating-ai-systems

Evaluating Large Language Models: A Comprehensive Survey

  • https://arxiv.org/abs/2310.19736
  • This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment evaluation and safety evaluation. In addition to the comprehensive review on the evaluation methodologies and benchmarks on these three aspects, we collate a compendium of evaluations pertaining to LLMs’ performance in specialized domains, and discuss the construction of comprehensive evaluation platforms that cover LLM evaluations on capabilities, alignment, safety, and applicability.

Evaluating Large Language Models Trained on Code

  • https://arxiv.org/abs/2107.03374

chatbot-arena-leaderboard

  • https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

Leveraging Large Language Models for NLG Evaluation: A Survey

  • https://arxiv.org/abs/2401.07103
---

[93]: loss

Table of readings

---

[94]: low-rank

Table of readings


Index Papers Our Slides
1 Beta VAE, Ladder VAE, Causal VAE Arsh Survey
2 Learnt Prior VAE Arsh Survey
3 Multitask Graph Autoencoder Arsh Survey
4 Introduction to component analysi Zhe Survey
5 Normalizing flow Zhe Survey
6 Nonlinear ICA Zhe Survey
7 Deep Convolutional Inverse Graphics Network Zhe Survey

Presenter Papers Paper URL Our Slides
Arshdeep Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction 1 PDF PDF
Arshdeep Decoupled Neural Interfaces Using Synthetic Gradients 2 PDF PDF
Arshdeep Diet Networks: Thin Parameters for Fat Genomics 3 PDF PDF
Arshdeep Metric Learning with Adaptive Density Discrimination 4 PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep HyperNetworks, David Ha, Andrew Dai, Quoc V. Le ICLR 2017 1 PDF PDF
Arshdeep Learning feed-forward one-shot learners 2 PDF PDF
Arshdeep Learning to Learn by gradient descent by gradient descent 3 PDF PDF
Arshdeep Dynamic Filter Networks 4 https://arxiv.org/abs/1605.09673 PDF PDF

Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[95]: manifold

Table of readings


Presenter Papers Paper URL Our Slides
spherical Spherical CNNs Pdf Fuwen PDF + Arshdeep Pdf
dynamic Dynamic graph cnn for learning on point clouds, 2018 Pdf Fuwen PDF
basics Geometric Deep Learning (simple introduction video) URL  
matching All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks Pdf Fuwen PDF
completion Geometric matrix completion with recurrent multi-graph neural networks Pdf Fuwen PDF
Tutorial Geometric Deep Learning on Graphs and Manifolds URL Arsh PDF
matching Similarity Learning with Higher-Order Proximity for Brain Network Analysis   Arsh PDF
pairwise Pixel to Graph with Associative Embedding PDF Fuwen PDF
3D 3D steerable cnns: Learning rotationally equivariant features in volumetric data URL Fuwen PDF
---

[96]: markov

Table of readings


Index Papers Our Slides
1 A Flexible Generative Framework for Graph-based Semi-supervised Learning Arsh Survey
2 Learning Discrete Structures for Graph Neural Networks Arsh Survey
4 Graph Markov Neural Nets Arsh Survey
  Graph Markov Neural Networks Jack Survey
5 GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations Arsh Survey
6 Subgraph Neural Networks Arsh Survey
7 Pointer Graph Networks Arsh Survey
8 Modeling Relational Data with Graph Convolutional Networks Arsh Survey
9 Graph Learning Zhe Survey
8 Neural Relational Inference Zhe Survey
---

[97]: matching

Table of readings


Presenter Papers Paper URL Our Slides
spherical Spherical CNNs Pdf Fuwen PDF + Arshdeep Pdf
dynamic Dynamic graph cnn for learning on point clouds, 2018 Pdf Fuwen PDF
basics Geometric Deep Learning (simple introduction video) URL  
matching All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks Pdf Fuwen PDF
completion Geometric matrix completion with recurrent multi-graph neural networks Pdf Fuwen PDF
Tutorial Geometric Deep Learning on Graphs and Manifolds URL Arsh PDF
matching Similarity Learning with Higher-Order Proximity for Brain Network Analysis   Arsh PDF
pairwise Pixel to Graph with Associative Embedding PDF Fuwen PDF
3D 3D steerable cnns: Learning rotationally equivariant features in volumetric data URL Fuwen PDF

Presenter Papers Paper URL Our Slides
Matching Deep Learning of Graph Matching, PDF+ PDF Jack Pdf
Matching Graph Edit Distance Computation via Graph Neural Networks PDF Jack Pdf
Basics Link Prediction Based on Graph Neural Networks Pdf Jack Pdf
Basics Supervised Community Detection with Line Graph Neural Networks Pdf Jack Pdf
Basics Graph mining: Laws, generators, and algorithms Pdf Arshdeep PDF
pooling Hierarchical graph representation learning with differentiable pooling PDF Eamon PDF
---

[98]: matching-net

Table of readings


Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[99]: matrix-completion

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF
---

[100]: memorization

Table of readings


Presenter Papers Paper URL Our Slides
Ceyer A Closer Look at Memorization in Deep Networks, ICML17 1 PDF PDF
  On the Expressive Efficiency of Overlapping Architectures of Deep Learning 2 DLSSpdf + video  
Mutual Information Opening the Black Box of Deep Neural Networks via Information 3 URL + video  
ChaoJiang Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16 PDF PDF
---

[101]: memory

Table of readings


Presenter Papers Paper URL Our Slides
Tianlu Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, ICML17 1 PDF + code PDF
Jack Reasoning with Memory Augmented Neural Networks for Language Comprehension, ICLR17 2 PDF PDF
Xueying State-Frequency Memory Recurrent Neural Networks, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Jack Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 1 PDF PDF
Arshdeep Bidirectional Attention Flow for Machine Comprehension, ICLR17 2 PDF + code PDF
Ceyer Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17 PDF + code PDF
ChaoJiang Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 3 PDF PDF
  An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17 PDF  

Presenter Papers Paper URL Our Slides
Beilun Learning Deep Parsimonious Representations, NIPS16 1 PDF PDF
Jack Dense Associative Memory for Pattern Recognition, NIPS16 2 PDF + video PDF

Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[102]: meta-learning

Table of readings


Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[103]: metamorphic

Table of readings


Presenter Papers Paper URL Our Slides
GaoJi A few useful things to know about machine learning PDF PDF
GaoJi A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions PDF PDF
GaoJi Automated White-box Testing of Deep Learning Systems 1 PDF PDF
GaoJi Testing and Validating Machine Learning Classifiers by Metamorphic Testing 2 PDF PDF
GaoJi Software testing: a research travelogue (2000–2014) PDF PDF
---

[104]: metric-learning

Table of readings


Presenter Papers Paper URL Our Slides
Derrick GloVe: Global Vectors for Word Representation PDF PDF
Derrick PARL.AI: A unified platform for sharing, training and evaluating dialog models across many tasks. URL PDF
Derrick scalable nearest neighbor algorithms for high dimensional data (PAMI14) 1 PDF PDF
Derrick StarSpace: Embed All The Things! PDF PDF
Derrick Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading, Martin Raison, Pierre-Emmanuel Mazaré, Rajarshi Das, Antoine Bordes PDF PDF

Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[105]: mimic

Table of readings


Presenter Papers Paper URL Our Slides
Muthu Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal 1 PDF PDF
Muthu Fast Training of Recurrent Networks Based on EM Algorithm (1998) 2 PDF PDF
Muthu FitNets: Hints for Thin Deep Nets, ICLR15 3 PDF PDF
Muthu Two NIPS 2015 Deep Learning Optimization Papers PDF PDF
Muthu Difference Target Propagation (2015) 4 PDF PDF
---

[106]: mitigate

Table of readings


In this session, our readings cover:

Required Readings:

Are Large Pre-Trained Language Models Leaking Your Personal Information?

  • https://arxiv.org/abs/2205.12628
  • Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner’s name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.

Privacy Risks of General-Purpose Language Models

  • https://ieeexplore.ieee.org/abstract/document/9152761
  • We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, different…

More Readings:

Privacy in Large Language Models: Attacks, Defenses and Future Directions

  • https://arxiv.org/abs/2310.10383
  • The advancement of large language models (LLMs) has significantly enhanced the ability to effectively tackle various downstream NLP tasks and unify these tasks into generative pipelines. On the one hand, powerful language models, trained on massive textual data, have brought unparalleled accessibility and usability for both models and users. On the other hand, unrestricted access to these models can also introduce potential malicious and unintentional privacy risks. Despite ongoing efforts to address the safety and privacy concerns associated with LLMs, the problem remains unresolved. In this paper, we provide a comprehensive analysis of the current privacy attacks targeting LLMs and categorize them according to the adversary’s assumed capabilities to shed light on the potential vulnerabilities present in LLMs. Then, we present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks. Beyond existing works, we identify upcoming privacy concerns as LLMs evolve. Lastly, we point out several potential avenues for future exploration.

ProPILE: Probing Privacy Leakage in Large Language Models

  • https://arxiv.org/abs/2307.01881
  • Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.

In this session, our readings cover:

Required Readings:

Foundation Models and Fair Use

  • Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, Percy Liang
  • URL
  • Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.

Extracting Training Data from Diffusion Models

  • Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
  • Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

  • https://arxiv.org/abs/2303.04226
  • Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.

More Readings:

Audio Deepfake Detection: A Survey

  • https://arxiv.org/abs/2308.14970
  • Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results.
  • https://openreview.net/forum?id=pSf8rrn49H
  • The images generated by text-to-image models could be accused of the copyright infringement, which has aroused heated debate among AI developers, content creators, legislation department and judicature department. Especially, the state-of-the-art text-to-image models are capable of generating extremely high-quality works while at the same time lack the ability to attribute credits to the original creators, which brings anxiety to the artists’ community. In this paper, we propose a conceptual framework – copyright Plug-in Market – to address the tension between the users, the content creators and the generative models. We introduce three operations in the \copyright Plug-in Market: addition, extraction and combination to facilitate proper credit attribution in the text-to-image procedure and enable the digital copyright protection. For the addition operation, we train a \copyright plug-in for a specific copyrighted concept and add it to the generative model and then we are able to generate new images with the copyrighted concept, which abstract existing solutions of portable LoRAs. We further introduce the extraction operation to enable content creators to claim copyrighted concept from infringing generative models and the combination operation to enable users to combine different \copyright plug-ins to generate images with multiple copyrighted concepts. We believe these basic operations give good incentives to each participant in the market, and enable enough flexibility to thrive the market. Technically, we innovate an inverse LoRA’’ approach to instantiate the extraction operation and propose a data-ignorant layer-wise distillation’’ approach to combine the multiple extractions or additions easily. To showcase the diverse capabilities of copyright plug-ins, we conducted experiments in two domains: style transfer and cartoon IP recreation. The results demonstrate that copyright plug-ins can effectively accomplish copyright extraction and combination, providing a valuable copyright protection solution for the era of generative AIs.

Membership Inference Attacks against Language Models via Neighbourhood Comparison

https://aclanthology.org/2023.findings-acl.719/

Deepfake Taylor Swift event:

  • https://www.cbsnews.com/news/taylor-swift-artificial-intellignence-ai-4chan/

In this session, our readings cover:

Required Readings:

TrustLLM: Trustworthiness in Large Language Models

  • https://arxiv.org/abs/2401.05561
  • Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

  • Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into “The Good” (beneficial LLM applications), “The Bad” (offensive applications), and “The Ugly” (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs’ potential to both bolster and jeopardize cybersecurity
  • https://arxiv.org/abs/2312.02003

More Readings:

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

  • https://arxiv.org/abs/2212.14834
  • Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs, a subfield of trustworthy ML, combining the perspectives of Natural Language Processing and Security. Prior work has shown that even safety-aligned LLMs (via instruction tuning and reinforcement learning through human feedback) can be susceptible to adversarial attacks, which exploit weaknesses and mislead AI systems, as evidenced by the prevalence of `jailbreak’ attacks on models like ChatGPT a

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

  • https://arxiv.org/abs/2311.16119
  • Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.

Even More:

ACL 2024 Tutorial: Vulnerabilities of Large Language Models to Adversarial Attacks

  • https://llm-vulnerability.github.io/

Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration

  • https://www.tandfonline.com/doi/full/10.1080/15228053.2023.2233814

  • https://huggingface.co/blog?tag=ethics

    • https://huggingface.co/blog/ethics-diffusers
    • https://huggingface.co/blog/model-cards
    • https://huggingface.co/blog/us-national-ai-research-resource

NIST AI RISK MANAGEMENT FRAMEWORK

  • https://www.nist.gov/itl/ai-risk-management-framework
  • https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
  • https://airc.nist.gov/AI_RMF_Knowledge_Base/Roadmap
  • EU AI Act / GDPR

In this session, our readings cover:

Required Readings:

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

  • https://arxiv.org/abs/2312.06674
  • We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to as response classification. For the purpose of both prompt and response classification, we have meticulously gathered a dataset of high quality. Llama Guard, a Llama2-7b model that is instruction-tuned on our collected dataset, albeit low in volume, demonstrates strong performance on existing benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat, where its performance matches or exceeds that of currently available content moderation tools. Llama Guard functions as a language model, carrying out multi-class classification and generating binary decision scores. Furthermore, the instruction fine-tuning of Llama Guard allows for the customization of tasks and the adaptation of output formats. This feature enhances the model’s capabilities, such as enabling the adjustment of taxonomy categories to align with specific use cases, and facilitating zero-shot or few-shot prompting with diverse taxonomies at the input. We are making Llama Guard model weights available and we encourage researchers to further develop and adapt them to meet the evolving needs of the community for AI safety.

More Readings:

Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

  • [Submitted on 23 Feb 2023 (v1), last revised 5 May 2023 (this version, v2)]
  • https://arxiv.org/abs/2302.12173
  • Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
  • Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks’ practical viability against both real-world systems, such as Bing’s GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application’s functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

  • https://github.com/neelsjain/baseline-defenses
---

[107]: mobile

Table of readings


Presenter Papers Paper URL Our Slides
Edge MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications PDF  
Edge XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks URL Ryan PDF
Edge DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices Pdf Eamon PDF
Edge Loss-aware Binarization of Deep Networks, ICLR17 PDF Ryan PDF
Edge Espresso: Efficient Forward Propagation for Binary Deep Neural Networks Pdf Eamon PDF
Dynamic Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution PDF Weilin PDF
Dynamic Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems PDF  
Dynamic Cavs: An Efficient Runtime System for Dynamic Neural Networks Pdf  
---

[108]: model-as-sample

Table of readings

---

[109]: model-criticism

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep Generalization and Equilibrium in Generative Adversarial Nets (ICML17) 1 PDF + video PDF
Arshdeep Mode Regularized Generative Adversarial Networks (ICLR17) 2 PDF PDF
Bargav Improving Generative Adversarial Networks with Denoising Feature Matching, ICLR17 3 PDF PDF
Anant Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy, ICLR17 4 PDF + code PDF

Presenter Papers Paper URL Our Slides
Rita Learning Important Features Through Propagating Activation Differences, ICML17 1 PDF PDF
GaoJi Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16 2 PDF PDF
Rita Learning Kernels with Random Features, Aman Sinha*; John Duchi, 3 PDF PDF
---

[110]: modeledit

Table of readings


In this session, our readings cover:

Required Readings:

Editing Large Language Models: Problems, Methods, and Opportunities

  • https://arxiv.org/abs/2305.13172
  • Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang Despite the ability to train capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To this end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to efficiently alter the behavior of LLMs within a specific domain without negatively impacting performance across other inputs. This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. In particular, we provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. We also build a new benchmark dataset to facilitate a more robust evaluation and pinpoint enduring issues intrinsic to existing techniques. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context. Code and datasets are available at this https URL. Comments: EMNLP 2023. Updated with new experiments

More Readings:

Tuning Language Models by Proxy

  • Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
  • Submitted on 16 Jan 2024]
  • Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors. However, tuning these models has become increasingly resource-intensive, or impossible when model weights are private. We introduce proxy-tuning, a lightweight decoding-time algorithm that operates on top of black-box LMs to achieve the result of directly tuning the model, but by accessing only its prediction over the output vocabulary. Our method instead tunes a smaller LM, then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the base model in the direction of tuning, while retaining the benefits of larger scale pretraining. In experiments, when we apply proxy-tuning to Llama2-70B using proxies of only 7B size, we can close 88% of the gap between Llama2-70B and its truly-tuned chat version, when evaluated across knowledge, reasoning, and safety benchmarks. Interestingly, when tested on TruthfulQA, proxy-tuned models are actually more truthful than directly tuned models, possibly because decoding-time guidance better retains the model’s factual knowledge. We then demonstrate the generality of proxy-tuning by applying it for domain adaptation on code, and task-specific finetuning on question-answering and math problems. Our work demonstrates the promise of using small tuned LMs to efficiently customize large, potentially proprietary LMs through decoding-time guidance.

A Survey of Machine Unlearning

  • https://arxiv.org/abs/2209.02299
  • Today, computer systems hold large amounts of personal data. Yet while such an abundance of data allows breakthroughs in artificial intelligence, and especially machine learning (ML), its existence can be a threat to user privacy, and it can weaken the bonds of trust between humans and AI. Recent regulations now require that, on request, private information about a user must be removed from both computer systems and from ML models, i.e. ``the right to be forgotten’’). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often `remember’ the old data. Contemporary adversarial attacks on trained models have proven that we can learn whether an instance or an attribute belonged to the training data. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to completely solve the problem due to the lack of common frameworks and resources. Therefore, this paper aspires to present a comprehensive examination of machine unlearning’s concepts, scenarios, methods, and applications. Specifically, as a category collection of cutting-edge studies, the intention behind this article is to serve as a comprehensive resource for researchers and practitioners seeking an introduction to machine unlearning and its formulations, design criteria, removal requests, algorithms, and applications. In addition, we aim to highlight the key findings, current trends, and new research areas that have not yet featured the use of machine unlearning but could benefit greatly from it. We hope this survey serves as a valuable resource for ML researchers and those seeking to innovate privacy technologies. Our resources are publicly available at this https URL.

AI Model Disgorgement: Methods and Choices

  • https://arxiv.org/abs/2304.03545
  • Alessandro Achille, Michael Kearns, Carson Klingenberg, Stefano Soatto Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement – the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of “removing the effects” of data in the trained model in a way that does not require retraining from scratch.
---

[111]: molecule

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Bio KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 1 Pdf Eli Pdf
Bio Molecular geometry prediction using a deep generative graph neural network Pdf Eli Pdf
Bio Visualizing convolutional neural network protein-ligand scoring PDF() Eli PDF
Bio Deep generative models of genetic variation capture mutation effects PDF() Eli PDF
Bio Attentive cross-modal paratope prediction Pdf Eli PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Eric Modeling polypharmacy side effects with graph convolutional networks PDF PDF
Eric Protein Interface Prediction using Graph Convolutional Networks PDF PDF
Eric Structure biology meets data science: does anything change URL PDF
Eric DeepSite: protein-binding site predictor using 3D-convolutional neural networks URL PDF
---

[112]: multi-label

Table of readings


Presenter Papers Paper URL Our Slides
Chao Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification PDF PDF
Jack FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning PDF PDF
BasicMLC Multi-Label Classification: An Overview PDF  
SPEN Structured Prediction Energy Networks PDF  
InfNet Learning Approximate Inference Networks for Structured Prediction PDF  
SPENMLC Deep Value Networks PDF  
Adversarial Semantic Segmentation using Adversarial Networks PDF  
EmbedMLC StarSpace: Embed All The Things! PDF  
deepMLC CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016 PDF  
deepMLC Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018 PDF  
---

[113]: multi-task

Table of readings


Index Papers Our Slides
1 Invariant Risk Minimization Zhe Survey
2 Causal Machine Learning Zhe Survey
3 A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms Zhe Survey
3 Review on Optimization-Based Meta Learning Zhe Survey
4 Domain adaptation and counterfactual prediction Zhe Survey
5 Gaussian Processes Zhe Survey
6 A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data Zhe Survey
7 Few-shot domain adaptation by causal mechanism transfer Zhe Survey

Presenter Papers Paper URL Our Slides
Jack Hasselt - Deep Reinforcement Learning RLSS17.pdf + video PDF
Tianlu Roux - RL in the Industry RLSS17.pdf + video PDF / PDF-Bandit
Xueying Singh - Steps Towards Continual Learning pdf + video PDF
GaoJi Distral: Robust Multitask Reinforcement Learning 1 PDF PDF
---

Table of readings


Team INDEX Title & Link Tags Our Slide
T11 Parameter-Efficient Transfer Learning for NLP meta, BERT, text, Transfer OurSlide
T22 Deep Asymmetric Multi-task Feature Learning meta, regularization, Multi-task OurSlide
---

[114]: mutual-information

Table of readings


Index Papers Our Slides
1 Review on Semi-Supervised Learning Zhe Survey
2 Review on Generative Adversarial Networks Zhe Survey
3 Information theory in deep learning Zhe Survey
4 Lagrange Optimization Zhe Survey
5 Deep Learning and Information Theory, and Graph Neural Network Derrick Survey
6 Loss Functions for Deep Structured Models Jack Survey
7 Group Sparsity and Optimization Zhe Survey
---

[115]: neural-programming

Table of readings


Presenter Papers Paper URL Our Slides
Jack Learning to Query, Reason, and Answer Questions On Ambiguous Texts, ICLR17 1 PDF PDF
Arshdeep Making Neural Programming Architectures Generalize via Recursion, ICLR17 2 PDF PDF
Xueying Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music, ICLR17 3 PDF PDF
---

[116]: neuroscience

Table of readings


Ganguli - Theoretical Neuroscience and Deep Learning

Presenter Papers Paper URL Our Slides
DLSS16 video    
DLSS17 video + slide    
DLSS17 Deep learning in the brain DLSS17 + Video  
---

[117]: nlp

Table of readings


Presenter Papers Paper URL Our Slides
QA A Comparison of Current Graph Database Models Pdf + PDF2 Bill PDF
QA Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text Pdf Bill [PDF + GaoJi Pdf
QA Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan Pdf Bill PDF + GaoJi Pdf
QA Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix PDF + Pdf Bill PDF + GaoJi Pdf
Text Adversarial Text Generation via Feature-Mover’s Distance URL Faizan PDF
Text Content preserving text generation with attribute controls URL Faizan PDF
Text Multiple-Attribute Text Rewriting, ICLR, 2019, URL Faizan PDF
Text Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace URL Faizan PDF

Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Jennifer Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Jennifer Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning PDF PDF
Jennifer Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers PDF PDF
Jennifer CleverHans PDF PDF
Ji Ji-f18-New papers about adversarial attack   PDF

Presenter Papers Paper URL Our Slides
Bill Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 1 PDF PDF
Bill Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio PDF PDF
Bill Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang 2 PDF PDF
Bill On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick PDF PDF

Presenter Papers Paper URL Our Slides
Bill Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 1 PDF PDF
Bill Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio PDF PDF
Bill Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang 2 PDF PDF
Bill On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick PDF PDF

Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  
---

[118]: noise

Table of readings


Presenter Papers Paper URL Our Slides
Understand Faithful and Customizable Explanations of Black Box Models Pdf Derrick PDF
Understand A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 Pdf GaoJi PDF + Bill Pdf
Understand How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Pdf + Pdf GaoJi PDF
Understand Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks Pdf + PDF GaoJi PDF
Understand Attention is not Explanation, 2019 PDF  
Understand Understanding attention in graph neural networks, 2019 PDF  

Presenter Papers Paper URL Our Slides
Tianlu Robustness of classifiers: from adversarial to random noise, NIPS16 PDF 1 PDF
Anant Blind Attacks on Machine Learners, 2 NIPS16 PDF PDF
  Data Noising as Smoothing in Neural Network Language Models (Ng), ICLR17 3 pdf  
  The Robustness of Estimator Composition, NIPS16 4 PDF  

Presenter Papers Paper URL Our Slides
Jack Learning to Query, Reason, and Answer Questions On Ambiguous Texts, ICLR17 1 PDF PDF
Arshdeep Making Neural Programming Architectures Generalize via Recursion, ICLR17 2 PDF PDF
Xueying Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music, ICLR17 3 PDF PDF
---

[119]: nonparametric

Table of readings


Presenter Papers Paper URL Our Slides
Shijia Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 1 PDF PDF
Ceyer Sequence Modeling via Segmentations, ICML17 2 PDF PDF
Arshdeep Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Jack Learning End-to-End Goal-Oriented Dialog, ICLR17 1 PDF PDF
Bargav Nonparametric Neural Networks, ICLR17 2 PDF PDF
Bargav Learning Structured Sparsity in Deep Neural Networks, NIPS16 3 PDF PDF
Arshdeep Learning the Number of Neurons in Deep Networks, NIPS16 4 PDF PDF
---

[120]: normalization

Table of readings

---

[121]: ntm

Table of readings


Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[122]: optimization

Table of readings


Presenter Papers Paper URL Our Slides
Shijia Professor Forcing: A New Algorithm for Training Recurrent Networks, 1 NIPS16 PDF + Video PDF
Beilun+Arshdeep Mollifying Networks, Bengio, ICLR17 2 PDF PDF / PDF2

Presenter Papers Paper URL Our Slides
GaoJi Forward and Reverse Gradient-Based Hyperparameter Optimization, ICML17 1 PDF PDF
Chaojiang Adaptive Neural Networks for Efficient Inference, ICML17 2 PDF PDF
Bargav Practical Gauss-Newton Optimisation for Deep Learning, ICML17 3 PDF PDF
Rita How to Escape Saddle Points Efficiently, ICML17 4 PDF PDF
  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning PDF  

Presenter Papers Paper URL Our Slides
GaoJi Neural Architecture Search with Reinforcement Learning, ICLR17 1 PDF PDF
Ceyer Learning to learn 2 DLSS17video PDF
Beilun Optimization as a Model for Few-Shot Learning, ICLR17 3 PDF + More PDF
Anant Neural Optimizer Search with Reinforcement Learning, ICML17 4 PDF PDF
---

Table of readings


Index Papers Our Slides
1 Review on Semi-Supervised Learning Zhe Survey
2 Review on Generative Adversarial Networks Zhe Survey
3 Information theory in deep learning Zhe Survey
4 Lagrange Optimization Zhe Survey
5 Deep Learning and Information Theory, and Graph Neural Network Derrick Survey
6 Loss Functions for Deep Structured Models Jack Survey
7 Group Sparsity and Optimization Zhe Survey

Team INDEX Title & Link Tags Our Slide
T2 Empirical Study of Example Forgetting During Deep Neural Network Learning Sample Selection, forgetting OurSlide
T29 Select Via Proxy: Efficient Data Selection For Training Deep Networks Sample Selection OurSlide
T9 How SGD Selects the Global Minima in over-parameterized Learning optimization OurSlide
T10 Escaping Saddles with Stochastic Gradients optimization OurSlide
T13 To What Extent Do Different Neural Networks Learn the Same Representation subspace OurSlide
T19 On the Information Bottleneck Theory of Deep Learning informax OurSlide
T20 Visualizing the Loss Landscape of Neural Nets normalization OurSlide
T21 Using Pre-Training Can Improve Model Robustness and Uncertainty training, analysis OurSlide
T24 Norm matters: efficient and accurate normalization schemes in deep networks normalization OurSlide

Presenter Papers Paper URL Our Slides
Ceyer An overview of gradient optimization algorithms, 1 PDF PDF
Shijia Osborne - Probabilistic numerics for deep learning 2 DLSS 2017 + Video PDF / PDF2
Jack Automated Curriculum Learning for Neural Networks, ICML17 3 PDF PDF
DLSS17 Johnson - Automatic Differentiation 4 slide + video  

Presenter Papers Paper URL Our Slides
Arshdeep Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction 1 PDF PDF
Arshdeep Decoupled Neural Interfaces Using Synthetic Gradients 2 PDF PDF
Arshdeep Diet Networks: Thin Parameters for Fat Genomics 3 PDF PDF
Arshdeep Metric Learning with Adaptive Density Discrimination 4 PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep HyperNetworks, David Ha, Andrew Dai, Quoc V. Le ICLR 2017 1 PDF PDF
Arshdeep Learning feed-forward one-shot learners 2 PDF PDF
Arshdeep Learning to Learn by gradient descent by gradient descent 3 PDF PDF
Arshdeep Dynamic Filter Networks 4 https://arxiv.org/abs/1605.09673 PDF PDF

Presenter Papers Paper URL Our Slides
Muthu Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal 1 PDF PDF
Muthu Fast Training of Recurrent Networks Based on EM Algorithm (1998) 2 PDF PDF
Muthu FitNets: Hints for Thin Deep Nets, ICLR15 3 PDF PDF
Muthu Two NIPS 2015 Deep Learning Optimization Papers PDF PDF
Muthu Difference Target Propagation (2015) 4 PDF PDF
---

[123]: parallel

Table of readings


Presenter Papers Paper URL Our Slides
Scalable FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling Pdf Ryan PDF + Arshdeep Pdf
Scalable MILE: A Multi-Level Framework for Scalable Graph Embedding Pdf Ryan PDF
Scalable LanczosNet: Multi-Scale Deep Graph Convolutional Networks Pdf Ryan PDF
Scalable Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Pdf Derrick PDF
Scalable Towards Federated learning at Scale: System Design URL Derrick PDF
Scalable DNN Dataflow Choice Is Overrated PDF Derrick PDF
Scalable Towards Efficient Large-Scale Graph Neural Network Computing Pdf Derrick PDF
Scalable PyTorch Geometric URL  
Scalable PyTorch BigGraph URL  
Scalable Simplifying Graph Convolutional Networks Pdf  
Scalable Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Pdf  

Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[124]: parsimonious

Table of readings


Presenter Papers Paper URL Our Slides
Beilun Learning Deep Parsimonious Representations, NIPS16 1 PDF PDF
Jack Dense Associative Memory for Pattern Recognition, NIPS16 2 PDF + video PDF
---

[125]: planning

Table of readings


Presenter Papers Paper URL Our Slides
Anant The Predictron: End-to-End Learning and Planning, ICLR17 1 PDF PDF
ChaoJiang Szepesvari - Theory of RL 2 RLSS.pdf + Video PDF
GaoJi Mastering the game of Go without human knowledge / Nature 2017 3 PDF PDF
  Thomas - Safe Reinforcement Learning RLSS17.pdf + video  
  Sutton - Temporal-Difference Learning RLSS17.pdf + Video  
---

[126]: pointer

Table of readings


Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[127]: privacy

Table of readings


Presenter Papers Paper URL Our Slides
Xueying Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, ICLR17 1 PDF PDF
Bargav Deep Learning with Differential Privacy, CCS16 2 PDF + video PDF
Bargav Privacy-Preserving Deep Learning, CCS15 3 PDF PDF
Xueying Domain Separation Networks, NIPS16 4 PDF PDF

Presenter Papers Paper URL Our Slides
Tobin Summary of A few Papers on: Machine Learning and Cryptography, (e.g., learning to Protect Communications with Adversarial Neural Cryptography) 1 PDF PDF
Tobin Privacy Aware Learning (NIPS12) 2 PDF PDF
Tobin Can Machine Learning be Secure?(2006) PDF PDF
---

Table of readings

---

[128]: program

Table of readings


Presenter Papers Paper URL Our Slides
Program Neural network-based graph embedding for cross-platform binary code similarity detection Pdf + Pdf Faizan PDF + GaoJi Pdf
Program Deep Program Reidentification: A Graph Neural Network Solution Pdf Weilin PDF
Program Heterogeneous Graph Neural Networks for Malicious Account Detection Pdf Weilin Pdf
Program Learning to represent programs with graphs Pdf 1  

Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF
---

[129]: propagation

Table of readings


Presenter Papers Paper URL Our Slides
Muthu Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal 1 PDF PDF
Muthu Fast Training of Recurrent Networks Based on EM Algorithm (1998) 2 PDF PDF
Muthu FitNets: Hints for Thin Deep Nets, ICLR15 3 PDF PDF
Muthu Two NIPS 2015 Deep Learning Optimization Papers PDF PDF
Muthu Difference Target Propagation (2015) 4 PDF PDF
---

[130]: protein

Table of readings


Papers Paper URL Abstract
Evolutionary-scale prediction of atomic level protein structure with a language model URL “show that direct inference of structure from primary sequence using a large language model enables an order of magnitude speed-up in high resolution structure prediction. Leveraging the insight that language models learn evolutionary patterns across millions of sequences, we train models up to 15B parameters,…”
DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking URL “Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space.”

Index Papers Our Slides
1 Protein 3D Structure Computed from Evolutionary Sequence Variation Arsh Survey
3 Regulatory network inference on developmental and evolutionary lineages Arsh Survey
4 Deep learning in ultrasound image analysis Zhe Survey
5 Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (DeepBind) Jack Survey
6 Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors Jack Survey
7 BindSpace decodes transcription factor binding signals by large-scale sequence embedding Jack Survey
8 FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Jack Survey
9 Query-Reduction Networks for Question Answering Bill Survey
---

Table of readings


Presenter Papers Paper URL Our Slides
Bio KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 1 Pdf Eli Pdf
Bio Molecular geometry prediction using a deep generative graph neural network Pdf Eli Pdf
Bio Visualizing convolutional neural network protein-ligand scoring PDF() Eli PDF
Bio Deep generative models of genetic variation capture mutation effects PDF() Eli PDF
Bio Attentive cross-modal paratope prediction Pdf Eli PDF

Presenter Papers Paper URL Our Slides
Eric Modeling polypharmacy side effects with graph convolutional networks PDF PDF
Eric Protein Interface Prediction using Graph Convolutional Networks PDF PDF
Eric Structure biology meets data science: does anything change URL PDF
Eric DeepSite: protein-binding site predictor using 3D-convolutional neural networks URL PDF

Presenter Papers Paper URL Our Slides
Arshdeep deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 PDF PDF
Arshdeep The CRISPR tool kit for genome editing and beyond, Mazhar Adli PDF PDF
Eric Intro of Genetic Engineering PDF PDF
Eric Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs PDF PDF
Brandon Generative Modeling for Protein Structure URL PDF

Presenter Papers Paper URL Our Slides
DeepBind Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning PDF  
DeepSEA Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk PDF  
DeepSEA Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014    
BioBasics A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13    
BioBasics Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011    
BioBasics Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004    
BioBasics NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels    
MedSignal Segmenting Time Series: A Survey and Novel Approach, PDF  
---

[131]: pruning

Table of readings


Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[132]: qa

Table of readings


Presenter Papers Paper URL Our Slides
QA A Comparison of Current Graph Database Models Pdf + PDF2 Bill PDF
QA Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text Pdf Bill [PDF + GaoJi Pdf
QA Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan Pdf Bill PDF + GaoJi Pdf
QA Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix PDF + Pdf Bill PDF + GaoJi Pdf
Text Adversarial Text Generation via Feature-Mover’s Distance URL Faizan PDF
Text Content preserving text generation with attribute controls URL Faizan PDF
Text Multiple-Attribute Text Rewriting, ICLR, 2019, URL Faizan PDF
Text Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace URL Faizan PDF

Presenter Papers Paper URL Our Slides
Bill Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) PDF PDF
Derrick Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) PDF PDF
Chao Reading Wikipedia to Answer Open-Domain Questions PDF PDF
Jennifer Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text PDF PDF

Presenter Papers Paper URL Our Slides
Derrick GloVe: Global Vectors for Word Representation PDF PDF
Derrick PARL.AI: A unified platform for sharing, training and evaluating dialog models across many tasks. URL PDF
Derrick scalable nearest neighbor algorithms for high dimensional data (PAMI14) 1 PDF PDF
Derrick StarSpace: Embed All The Things! PDF PDF
Derrick Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading, Martin Raison, Pierre-Emmanuel Mazaré, Rajarshi Das, Antoine Bordes PDF PDF

Presenter Papers Paper URL Our Slides
Jack Learning to Query, Reason, and Answer Questions On Ambiguous Texts, ICLR17 1 PDF PDF
Arshdeep Making Neural Programming Architectures Generalize via Recursion, ICLR17 2 PDF PDF
Xueying Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music, ICLR17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Tianlu Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, ICML17 1 PDF + code PDF
Jack Reasoning with Memory Augmented Neural Networks for Language Comprehension, ICLR17 2 PDF PDF
Xueying State-Frequency Memory Recurrent Neural Networks, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Rita Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 1 PDF PDF
Tianlu Dynamic Coattention Networks For Question Answering, ICLR17 2 PDF + code PDF
ChaoJiang Structured Attention Networks, ICLR17 3 PDF + code PDF

Presenter Papers Paper URL Our Slides
Shijia Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 1 PDF PDF
Ceyer Sequence Modeling via Segmentations, ICML17 2 PDF PDF
Arshdeep Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Jack Learning End-to-End Goal-Oriented Dialog, ICLR17 1 PDF PDF
Bargav Nonparametric Neural Networks, ICLR17 2 PDF PDF
Bargav Learning Structured Sparsity in Deep Neural Networks, NIPS16 3 PDF PDF
Arshdeep Learning the Number of Neurons in Deep Networks, NIPS16 4 PDF PDF

Presenter Papers Paper URL Our Slides
QA Learning to rank with (a lot of) word features PDF  
Relation A semantic matching energy function for learning with multi-relational data PDF  
Relation Translating embeddings for modeling multi-relational data PDF  
QA Reading wikipedia to answer open-domain questions PDF  
QA Question answering with subgraph embeddings PDF  
---

[133]: quantization

Table of readings


Team INDEX Title & Link Tags Our Slide
T33 The High-Dimensional Geometry of Binary Neural Networks Quantization, binarization, scalable OurSlide
T34 Modern Neural Networks Generalize on Small Data Sets small-data, analysis, ensemble OurSlide
T4 Cognitive Scheduler for Heterogeneous High Performance Computing System system-application OurSlide
---

[134]: rag

Table of readings


In this session, our readings cover:

Required Readings:

Retrieval-Augmented Generation for AI-Generated Content: A Survey

  • https://arxiv.org/abs/2402.19473v1
  • The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, scalable foundation model architectures, and the availability of ample high-quality datasets. While AIGC has achieved remarkable performance, it still faces challenges, such as the difficulty of maintaining up-to-date and long-tail knowledge, the risk of data leakage, and the high costs associated with training and inference. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances AIGC results by retrieving relevant objects from available data stores, leading to greater accuracy and robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator. We distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Project: this https URL

Retrieval-Augmented Generation for Large Language Models: A Survey

  • https://arxiv.org/abs/2312.10997
  • Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in practical applications, such as hallucinations, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from external knowledge bases before answering questions with LLMs. RAG has been demonstrated to significantly enhance answer accuracy, reduce model hallucination, particularly for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model outputs. It also facilitates knowledge updates and the introduction of domain-specific knowledge. RAG effectively combines the parameterized knowledge of LLMs with non-parameterized external knowledge bases, making it one of the most important methods for implementing large language models. This paper outlines the development paradigms of RAG in the era of LLMs, summarizing three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It then provides a summary and organization of the three main components of RAG: retriever, generator, and augmentation methods, along with key technologies in each component. Furthermore, it discusses how to evaluate the effectiveness of RAG models, introducing two evaluation methods for RAG, emphasizing key metrics and abilities for evaluation, and presenting the latest automatic evaluation framework. Finally, potential future research directions are introduced from three aspects: vertical optimization, horizontal scalability, and the technical stack and ecosystem of RAG.

More Readings:

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

  • Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun
  • Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model’s background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora’s development and investigate the underlying technologies used to build this “world simulator”. Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.

A Comprehensive Study of Knowledge Editing for Large Language Models

  • https://arxiv.org/abs/2401.01286
  • Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs’ behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.

Even More

A Survey of Table Reasoning with Large Language Models

  • Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, Wanxiang Che
  • https://arxiv.org/abs/2402.08259
  • Table reasoning, which aims to generate the corresponding answer to the question following the user requirement according to the provided table, and optionally a text description of the table, effectively improving the efficiency of obtaining information. Recently, using Large Language Models (LLMs) has become the mainstream method for table reasoning, because it not only significantly reduces the annotation cost but also exceeds the performance of previous methods. However, existing research still lacks a summary of LLM-based table reasoning works. Due to the existing lack of research, questions about which techniques can improve table reasoning performance in the era of LLMs, why LLMs excel at table reasoning, and how to enhance table reasoning abilities in the future, remain largely unexplored. This gap significantly limits progress in research. To answer the above questions and advance table reasoning research with LLMs, we present this survey to analyze existing research, inspiring future work. In this paper, we analyze the mainstream techniques used to improve table reasoning performance in the LLM era, and the advantages of LLMs compared to pre-LLMs for solving table reasoning. We provide research directions from both the improvement of existing methods and the expansion of practical applications to inspire future research.
---

[135]: random

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Rita Learning Important Features Through Propagating Activation Differences, ICML17 1 PDF PDF
GaoJi Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16 2 PDF PDF
Rita Learning Kernels with Random Features, Aman Sinha*; John Duchi, 3 PDF PDF

Presenter Papers Paper URL Our Slides
DeepBind Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning PDF  
DeepSEA Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk PDF  
DeepSEA Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014    
BioBasics A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13    
BioBasics Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011    
BioBasics Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004    
BioBasics NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels    
MedSignal Segmenting Time Series: A Survey and Novel Approach, PDF  

Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[136]: reasoning

Table of readings


In this session, our readings cover:

Required Readings:

Augmented Language Models: a Survey

  • Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom
  • This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability,

Self-Consistency Improves Chain of Thought Reasoning in Language Models

  • https://arxiv.org/abs/2203.11171
  • Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

  • https://arxiv.org/abs/2401.00812
  • Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi R. Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, Chengxiang Zhai
  • The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs’ training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code.

More Readings:

ReAct: Synergizing Reasoning and Acting in Language Models

  • Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
  • While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: this https URL
  • Comments: v3 is the ICLR camera ready version with some typos fixed. Project site with code: this https URL

Towards Reasoning in Large Language Models: A Survey

  • Jie Huang, Kevin Chen-Chuan Chang
  • Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work. Comments: ACL 2023 Findings, 15 pages

Large Language Models Can Self-Improve

  • Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han / Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate “high-confidence” rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%->82.1% on GSM8K, 78.2%->83.0% on DROP, 90.0%->94.4% on OpenBookQA, and 63.4%->67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that fine-tuning on reasoning is critical for self-improvement.
  • https://arxiv.org/abs/2210.11610

Orca 2: Teaching Small Language Models How to Reason /

  • https://arxiv.org/abs/2311.11045
  • Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs’ reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. make Orca 2 weights publicly available at this http URL to support research on the development, evaluation, and alignment of smaller LMs
---

[137]: recommendation

Table of readings


Presenter Papers Paper URL Our Slides
Bill Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) PDF PDF
Derrick Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) PDF PDF
Chao Reading Wikipedia to Answer Open-Domain Questions PDF PDF
Jennifer Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text PDF PDF

Presenter Papers Paper URL Our Slides
QA Learning to rank with (a lot of) word features PDF  
Relation A semantic matching energy function for learning with multi-relational data PDF  
Relation Translating embeddings for modeling multi-relational data PDF  
QA Reading wikipedia to answer open-domain questions PDF  
QA Question answering with subgraph embeddings PDF  
---

[138]: regularization

Table of readings


Index Papers Our Slides
1 BIAS ALSO MATTERS: BIAS ATTRIBUTION FOR DEEP NEURAL NETWORK EXPLANATION Arsh Survey
2 Data Shapley: Equitable Valuation of Data for Machine Learning Arsh Survey
  What is your data worth? Equitable Valuation of Data Sanchit Survey
3 Neural Network Attributions: A Causal Perspective Zhe Survey
4 Defending Against Neural Fake News Eli Survey
5 Interpretation of Neural Networks is Fragile Eli Survey
  Interpretation of Neural Networks is Fragile Pan Survey
6 Parsimonious Black-Box Adversarial Attacks Via Efficient Combinatorial Optimization Eli Survey
7 Retrofitting Word Vectors to Semantic Lexicons Morris Survey
8 On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Morris Survey
9 Towards Deep Learning Models Resistant to Adversarial Attacks Pan Survey
10 Robust Attribution Regularization Pan Survey
11 Sanity Checks for Saliency Maps Sanchit Survey
12 Survey of data generation and evaluation in Interpreting DNN pipelines Sanchit Survey
13 Think Architecture First: Benchmarking Deep Learning Interpretability in Time Series Predictions Sanchit Survey
14 Universal Adversarial Triggers for Attacking and Analyzing NLP Sanchit Survey
15 Apricot: Submodular selection for data summarization in Python Arsh Survey

Team INDEX Title & Link Tags Our Slide
T11 Parameter-Efficient Transfer Learning for NLP meta, BERT, text, Transfer OurSlide
T22 Deep Asymmetric Multi-task Feature Learning meta, regularization, Multi-task OurSlide
---

[139]: relational

Table of readings


Index Papers Our Slides
1 A Flexible Generative Framework for Graph-based Semi-supervised Learning Arsh Survey
2 Learning Discrete Structures for Graph Neural Networks Arsh Survey
4 Graph Markov Neural Nets Arsh Survey
  Graph Markov Neural Networks Jack Survey
5 GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations Arsh Survey
6 Subgraph Neural Networks Arsh Survey
7 Pointer Graph Networks Arsh Survey
8 Modeling Relational Data with Graph Convolutional Networks Arsh Survey
9 Graph Learning Zhe Survey
8 Neural Relational Inference Zhe Survey

Team INDEX Title & Link Tags Our Slide
T3 Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints submodular, coreset, safety OurSlide
T6 Decision Boundary Analysis of Adversarial Examples adversarial-examples OurSlide
T8 Robustness may be at odds with accuracy robustness OurSlide
T18 Towards Reverse-Engineering Black-Box Neural Networks meta, model-as-sample, safety, privacy OurSlide
T23 The Odds are Odd: A Statistical Test for Detecting Adversarial Examples adversarial-examples OurSlide
T25 Learning how to explain neural networks: PatternNet and PatternAttribution Attribution, Interpretable OurSlide
T31 Detecting Statistical Interactions from Neural Network Weights Interpretable, Relational OurSlide
---

Table of readings


Presenter Papers Paper URL Our Slides
Bio KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 1 Pdf Eli Pdf
Bio Molecular geometry prediction using a deep generative graph neural network Pdf Eli Pdf
Bio Visualizing convolutional neural network protein-ligand scoring PDF() Eli PDF
Bio Deep generative models of genetic variation capture mutation effects PDF() Eli PDF
Bio Attentive cross-modal paratope prediction Pdf Eli PDF

Presenter Papers Paper URL Our Slides
Matching Deep Learning of Graph Matching, PDF+ PDF Jack Pdf
Matching Graph Edit Distance Computation via Graph Neural Networks PDF Jack Pdf
Basics Link Prediction Based on Graph Neural Networks Pdf Jack Pdf
Basics Supervised Community Detection with Line Graph Neural Networks Pdf Jack Pdf
Basics Graph mining: Laws, generators, and algorithms Pdf Arshdeep PDF
pooling Hierarchical graph representation learning with differentiable pooling PDF Eamon PDF

Presenter Papers Paper URL Our Slides
Bill Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) PDF PDF
Derrick Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) PDF PDF
Chao Reading Wikipedia to Answer Open-Domain Questions PDF PDF
Jennifer Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Relational inductive biases, deep learning, and graph networks PDF PDF
Arshdeep Discriminative Embeddings of Latent Variable Models for Structured Data PDF PDF
Jack Deep Graph Infomax PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 1 PDF PDF
Arshdeep Latent Alignment and Variational Attention 2 PDF PDF
Arshdeep Modularity Matters: Learning Invariant Relational Reasoning Tasks, Jason Jo, Vikas Verma, Yoshua Bengio 3 PDF PDF

Presenter Papers Paper URL Our Slides
Jack Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 1 PDF PDF
Arshdeep Bidirectional Attention Flow for Machine Comprehension, ICLR17 2 PDF + code PDF
Ceyer Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17 PDF + code PDF
ChaoJiang Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 3 PDF PDF
  An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17 PDF  

Presenter Papers Paper URL Our Slides
Rita Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 1 PDF PDF
Tianlu Dynamic Coattention Networks For Question Answering, ICLR17 2 PDF + code PDF
ChaoJiang Structured Attention Networks, ICLR17 3 PDF + code PDF

Presenter Papers Paper URL Our Slides
DeepBind Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning PDF  
DeepSEA Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk PDF  
DeepSEA Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014    
BioBasics A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13    
BioBasics Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011    
BioBasics Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004    
BioBasics NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels    
MedSignal Segmenting Time Series: A Survey and Novel Approach, PDF  

Presenter Papers Paper URL Our Slides
QA Learning to rank with (a lot of) word features PDF  
Relation A semantic matching energy function for learning with multi-relational data PDF  
Relation Translating embeddings for modeling multi-relational data PDF  
QA Reading wikipedia to answer open-domain questions PDF  
QA Question answering with subgraph embeddings PDF  

Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  
---

[140]: rl

Table of readings


Papers Paper URL Abstract
Training language models to follow instructions with human feedback URL “further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences URL “explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”

Decision Transformer: Reinforcement Learning via Sequence Modeling

  • Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
  • https://arxiv.org/abs/2106.01345
  • We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

Prompting Decision Transformer for Few-Shot Policy Generalization

  • Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
  • https://arxiv.org/abs/2206.13499
  • Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

Papers Paper URL Abstract
A Generalist Agent URL Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.
Why should we prefer offline reinforcement learning over behavioral cloning? ICLR 2022 URL natural to ask: when can an offline RL method outperform BC with an equal amount of expert data, even when BC is a natural choice?
Uni[MASK]: Unified Inference in Sequential Decision Problems URL show how sequential decision making tasks can be thought of in terms of corresponding input maskings, enabling the training of a single model to perform all tasks at once. applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns.

Index Papers Our Slides
1 Actor-Critic Methods for Control Jake Survey
2 Generalization in Deep Reinforcement Learning Jake Survey
3 Sample Efficient RL (Part 1) Jake Survey
4 Sample Efficient RL (Part 2) Jake Survey
5 Model-Free Value Methods in Deep RL Jake Survey
6 Investigating Human Priors for Playing Video Games Arsh Survey

Team INDEX Title & Link Tags Our Slide
T1 Safe Reinforcement Learning via Shielding RL, safety, verification OurSlide

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  

Presenter Papers Paper URL Our Slides
Jack Hasselt - Deep Reinforcement Learning RLSS17.pdf + video PDF
Tianlu Roux - RL in the Industry RLSS17.pdf + video PDF / PDF-Bandit
Xueying Singh - Steps Towards Continual Learning pdf + video PDF
GaoJi Distral: Robust Multitask Reinforcement Learning 1 PDF PDF

Presenter Papers Paper URL Our Slides
GaoJi Neural Architecture Search with Reinforcement Learning, ICLR17 1 PDF PDF
Ceyer Learning to learn 2 DLSS17video PDF
Beilun Optimization as a Model for Few-Shot Learning, ICLR17 3 PDF + More PDF
Anant Neural Optimizer Search with Reinforcement Learning, ICML17 4 PDF PDF

Pineau - RL Basic Concepts

Presenter Papers Paper URL Our Slides
DLSS16 video    
RLSS17 slideRaw + video+ slide    
---

[141]: rna

Table of readings


Presenter Papers Paper URL Our Slides
Arshdeep DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. PDF PDF
Arshdeep Solving the RNA design problem with reinforcement learning, PLOSCB 1 PDF PDF
Arshdeep Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk 2 PDF PDF
Arshdeep Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio 3 PDF PDF
Brandon Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics PDF PDF
Arshdeep Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions 2 PDF PDF

Presenter Papers Paper URL Our Slides
DeepBind Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning PDF  
DeepSEA Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk PDF  
DeepSEA Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014    
BioBasics A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13    
BioBasics Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011    
BioBasics Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004    
BioBasics NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels    
MedSignal Segmenting Time Series: A Survey and Novel Approach, PDF  
---

[142]: rnn

Table of readings


Team INDEX Title & Link Tags Our Slide  
T5 Deep Structured Prediction with Nonlinear Output Transformations   structured OurSlide
T12 Large Margin Deep Networks for Classification OurSlide large-margin  
T15 Wide Activation for Efficient and Accurate Image Super-Resolution CNN OurSlide  
T17 Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks RNN OurSlide  
T28 Processing of missing data by neural networks imputation OurSlide  
T27 Implicit Acceleration by Overparameterization analysis OurSlide  

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Chao Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification PDF PDF
Jack FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning PDF PDF
BasicMLC Multi-Label Classification: An Overview PDF  
SPEN Structured Prediction Energy Networks PDF  
InfNet Learning Approximate Inference Networks for Structured Prediction PDF  
SPENMLC Deep Value Networks PDF  
Adversarial Semantic Segmentation using Adversarial Networks PDF  
EmbedMLC StarSpace: Embed All The Things! PDF  
deepMLC CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016 PDF  
deepMLC Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018 PDF  
---

[143]: robustness

Table of readings


Index Papers Our Slides
1 BIAS ALSO MATTERS: BIAS ATTRIBUTION FOR DEEP NEURAL NETWORK EXPLANATION Arsh Survey
2 Data Shapley: Equitable Valuation of Data for Machine Learning Arsh Survey
  What is your data worth? Equitable Valuation of Data Sanchit Survey
3 Neural Network Attributions: A Causal Perspective Zhe Survey
4 Defending Against Neural Fake News Eli Survey
5 Interpretation of Neural Networks is Fragile Eli Survey
  Interpretation of Neural Networks is Fragile Pan Survey
6 Parsimonious Black-Box Adversarial Attacks Via Efficient Combinatorial Optimization Eli Survey
7 Retrofitting Word Vectors to Semantic Lexicons Morris Survey
8 On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Morris Survey
9 Towards Deep Learning Models Resistant to Adversarial Attacks Pan Survey
10 Robust Attribution Regularization Pan Survey
11 Sanity Checks for Saliency Maps Sanchit Survey
12 Survey of data generation and evaluation in Interpreting DNN pipelines Sanchit Survey
13 Think Architecture First: Benchmarking Deep Learning Interpretability in Time Series Predictions Sanchit Survey
14 Universal Adversarial Triggers for Attacking and Analyzing NLP Sanchit Survey
15 Apricot: Submodular selection for data summarization in Python Arsh Survey

Team INDEX Title & Link Tags Our Slide
T3 Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints submodular, coreset, safety OurSlide
T6 Decision Boundary Analysis of Adversarial Examples adversarial-examples OurSlide
T8 Robustness may be at odds with accuracy robustness OurSlide
T18 Towards Reverse-Engineering Black-Box Neural Networks meta, model-as-sample, safety, privacy OurSlide
T23 The Odds are Odd: A Statistical Test for Detecting Adversarial Examples adversarial-examples OurSlide
T25 Learning how to explain neural networks: PatternNet and PatternAttribution Attribution, Interpretable OurSlide
T31 Detecting Statistical Interactions from Neural Network Weights Interpretable, Relational OurSlide

Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Tianlu Robustness of classifiers: from adversarial to random noise, NIPS16 PDF 1 PDF
Anant Blind Attacks on Machine Learners, 2 NIPS16 PDF PDF
  Data Noising as Smoothing in Neural Network Language Models (Ng), ICLR17 3 pdf  
  The Robustness of Estimator Composition, NIPS16 4 PDF  

Presenter Papers Paper URL Our Slides
GaoJi Delving into Transferable Adversarial Examples and Black-box Attacks,ICLR17 1 pdf PDF
Shijia On Detecting Adversarial Perturbations, ICLR17 2 pdf PDF
Anant Parseval Networks: Improving Robustness to Adversarial Examples, ICML17 3 pdf PDF
Bargav Being Robust (in High Dimensions) Can Be Practical, ICML17 4 pdf PDF

Presenter Papers Paper URL Our Slides
GaoJi A few useful things to know about machine learning PDF PDF
GaoJi A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions PDF PDF
GaoJi Automated White-box Testing of Deep Learning Systems 1 PDF PDF
GaoJi Testing and Validating Machine Learning Classifiers by Metamorphic Testing 2 PDF PDF
GaoJi Software testing: a research travelogue (2000–2014) PDF PDF

Presenter Papers Paper URL Our Slides
AE Intriguing properties of neural networks / PDF  
AE Explaining and Harnessing Adversarial Examples PDF  
AE Towards Deep Learning Models Resistant to Adversarial Attacks PDF  
AE DeepFool: a simple and accurate method to fool deep neural networks PDF  
AE Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner PDF PDF
Data Basic Survey of ImageNet - LSVRC competition URL PDF
Understand Understanding Black-box Predictions via Influence Functions PDF  
Understand Deep inside convolutional networks: Visualising image classification models and saliency maps PDF  
Understand BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1] PDF  
provable Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter, URL  
---

[144]: safety

Table of readings


KV Caching in LLM:

  • Retentive Network: A Successor to Transformer for Large Language Models: https://arxiv.org/abs/2307.08621

  • https://arxiv.org/abs/2305.13048 RWKV: Reinventing RNNs for the Transformer Era

  • grouped query attention: https://arxiv.org/pdf/2305.13245.pdf
  • Paged attention https://arxiv.org/pdf/2309.06180.pdf https://openreview.net/pdf?id=uNrFpDPMyo

Retentive Network: A Successor to Transformer for Large Language Models

  • In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation… Show more

RWKV: Reinventing RNNs for the Transformer Era

  • Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transfor… Show more

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

  • Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
  • The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 4,157 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop CUT, a state-of-the-art unlearning method based on controlling model representations. CUT reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at this https URL

Must know tools for training/finetuning/serving LLM’s -

  1. Torchtune - Build on top of Pytorch, for training and finetuning LLM’s. Uses yaml based configs for easily running experiments. Github -

  2. axolotl - Built on top on Huggigface peft and transformer library, supports fine-tuning a large number for models like Mistral, LLama etc. Provides support for techniques like RLHF, DPO, LORA, qLORA etc. Github

  3. LitGPT - Build on nanoGPT and Megatron, support pre-training and fine-tuning, has examples like Starcoder, TinyLlama etc. Github -

  4. Maxtext - Jax based library for training LLM’s on Google TPU’s with configs for models like Gemma, Mistral and LLama2 etc. Github

  5. Langchain- https://python.langchain.com/docs/get_started/introduction

  6. haystack.deepset.ai
    • https://github.com/deepset-ai/haystack
    • LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it’s best suited for building RAG, question answering, semantic search or conversational agent chatbots.
  7. LlamaIndex
    • https://docs.llamaindex.ai/en/stable/ LlamaIndex supports Retrieval-Augmented Generation (RAG). Instead of asking LLM to generate an answer immediately, LlamaIndex: retrieves information from your data sources first, / adds it to your question as context, and / asks the LLM to answer based on the enriched prompt.
  8. Making Retrieval Augmented Generation Fast
    • https://www.pinecone.io/learn/fast-retrieval-augmented-generation/
  9. OpenMoE
    • https://github.com/XueFuzhao/OpenMoE

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

  • Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, Xia Hu
  • This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current GPT- and BERT-style LLMs. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, natural language generation tasks, emergent abilities, and considerations for specific tasks.We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, .

  • https://github.com/Mooler0410/LLMsPracticalGuide

In this session, our readings cover:

Required Readings:

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

  • https://dl.acm.org/doi/10.1145/3442188.3445922
  • The past 3 years of work in NLP have been characterized by the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries of the possible both through architectural innovations and through sheer size. Using these pretrained models and the methodology of fine-tuning them for specific tasks, researchers have extended the state of the art on a wide array of tasks as measured by leaderboards on specific benchmarks for English. In this paper, we take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models.

More Readings:

Low-Resource Languages Jailbreak GPT-4

  • AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4’s safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs’ safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

  • https://arxiv.org/abs/2305.11391
  • Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

Even More

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation / EMNLP2023

  • Despite remarkable advances that large language models have achieved in chatbots nowadays, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media contents, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark constructed based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference when compared to social media contents. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.

OpenAI on LLM generated bio-x-risk

  • Building an early warning system for LLM-aided biological threat creation
  • https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation

A misleading open letter about sci-fi AI dangers ignores the real risks

https://www.aisnakeoil.com/p/a-misleading-open-letter-about-sci

Evaluating social and ethical risks from generative AI

  • https://deepmind.google/discover/blog/evaluating-social-and-ethical-risks-from-generative-ai/

Managing Existential Risk from AI without Undercutting Innovation

  • https://www.csis.org/analysis/managing-existential-risk-ai-without-undercutting-innovation

In this session, our readings cover:

Required Readings:

Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors

  • Dingcheng Yang, Yang Bai, Xiaojun Jia, Yang Liu, Xiaochun Cao, Wenjian Yu
  • Diffusion models have been widely deployed in various image generation tasks, demonstrating an extraordinary connection between image and text modalities. However, they face challenges of being maliciously exploited to generate harmful or sensitive images by appending a specific suffix to the original prompt. Existing works mainly focus on using single-modal information to conduct attacks, which fails to utilize multi-modal features and results in less than satisfactory performance. Integrating multi-modal priors (MMP), i.e. both text and image features, we propose a targeted attack method named MMP-Attack in this work. Specifically, the goal of MMP-Attack is to add a target object into the image content while simultaneously removing the original object. The MMP-Attack shows a notable advantage over existing works with superior universality and transferability, which can effectively attack commercial text-to-image (T2I) models such as DALL-E 3. To the best of our knowledge, this marks the first successful attempt of transfer-based attack to commercial T2I models. Our code is publicly available at ….

A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

  • https://ieeexplore.ieee.org/document/10208563
  • Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem ‘query-free attack generation’. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content.

More Readings:

Visual Instruction Tuning

  • Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
  • Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

  • https://arxiv.org/abs/2401.01523

Misusing Tools in Large Language Models With Visual Adversarial Examples

  • https://arxiv.org/abs/2310.03185

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

  • https://arxiv.org/abs/2209.07858

In this session, our readings cover:

Required Readings:

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

  • https://arxiv.org/abs/2402.04249
  • Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at this https URL.

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

  • https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training
  • Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

More Readings:

SafeText: A Benchmark for Exploring Physical Safety in Language Models

  • https://arxiv.org/abs/2210.10045
  • Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe. One such type of safety that has been scarcely studied is commonsense physical safety, i.e. text that is not explicitly violent and requires additional commonsense knowledge to comprehend that it leads to physical harm. We create the first benchmark dataset, SafeText, comprising real-life scenarios with paired safe and physically unsafe pieces of advice. We utilize SafeText to empirically study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks. We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice. As a result, we argue for further studies of safety and the assessment of commonsense physical safety in models before release.

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

  • https://arxiv.org/abs/2310.03693

Lessons learned on language model safety and misuse

  • https://openai.com/research/language-model-safety-and-misuse

Planning red teaming for large language models (LLMs) and their applications

https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/red-teaming

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

  • https://arxiv.org/abs/2310.09624
---

Table of readings

---

[145]: sample-selection

Table of readings

---

[146]: sampling

Table of readings


Presenter Papers Paper URL Our Slides
Ceyer Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 1 PDF PDF
Beilun Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy 2 PDF PDF
Ji Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 3 PDF PDF
Xueying End-to-End Differentiable Adversarial Imitation Learning, ICML17 4 PDF PDF
  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17 PDF  
  FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 5 PDF  
---

[147]: scalable

Table of readings


Presenter Papers Paper URL Our Notes
Basics GraphSAGE: Large-scale Graph Representation Learning by Jure Leskovec Stanford University URL + PDF  
Basics Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering by Xavier Bresson URL + PDF Ryan Pdf
Basics Gated Graph Sequence Neural Networks by Microsoft Research URL + PDF Faizan Pdf
Basics DeepWalk - Turning Graphs into Features via Network Embeddings URL + PDF  
Basics Spectral Networks and Locally Connected Networks on Graphs 1 Pdf GaoJi slides + Bill Pdf
Basics A Comprehensive Survey on Graph Neural Networks/ Graph Neural Networks: A Review of Methods and Applications Pdf Jack Pdf
GCN Semi-Supervised Classification with Graph Convolutional Networks Pdf Jack Pdf

Presenter Papers Paper URL Our Slides
Muthu Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal 1 PDF PDF
Muthu Fast Training of Recurrent Networks Based on EM Algorithm (1998) 2 PDF PDF
Muthu FitNets: Hints for Thin Deep Nets, ICLR15 3 PDF PDF
Muthu Two NIPS 2015 Deep Learning Optimization Papers PDF PDF
Muthu Difference Target Propagation (2015) 4 PDF PDF

Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[148]: secure

Table of readings


Presenter Papers Paper URL Our Slides
Tobin Summary of A few Papers on: Machine Learning and Cryptography, (e.g., learning to Protect Communications with Adversarial Neural Cryptography) 1 PDF PDF
Tobin Privacy Aware Learning (NIPS12) 2 PDF PDF
Tobin Can Machine Learning be Secure?(2006) PDF PDF
---

[149]: semi-supervised

Table of readings


Presenter Papers Paper URL Our Slides
Matching Deep Learning of Graph Matching, PDF+ PDF Jack Pdf
Matching Graph Edit Distance Computation via Graph Neural Networks PDF Jack Pdf
Basics Link Prediction Based on Graph Neural Networks Pdf Jack Pdf
Basics Supervised Community Detection with Line Graph Neural Networks Pdf Jack Pdf
Basics Graph mining: Laws, generators, and algorithms Pdf Arshdeep PDF
pooling Hierarchical graph representation learning with differentiable pooling PDF Eamon PDF

Presenter Papers Paper URL Our Slides
Xueying Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, ICLR17 1 PDF PDF
Bargav Deep Learning with Differential Privacy, CCS16 2 PDF + video PDF
Bargav Privacy-Preserving Deep Learning, CCS15 3 PDF PDF
Xueying Domain Separation Networks, NIPS16 4 PDF PDF
---

[150]: seq2seq

Table of readings


Presenter Papers Paper URL Our Slides
Understand Faithful and Customizable Explanations of Black Box Models Pdf Derrick PDF
Understand A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 Pdf GaoJi PDF + Bill Pdf
Understand How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Pdf + Pdf GaoJi PDF
Understand Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks Pdf + PDF GaoJi PDF
Understand Attention is not Explanation, 2019 PDF  
Understand Understanding attention in graph neural networks, 2019 PDF  

Presenter Papers Paper URL Our Slides
Bill Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) PDF PDF
Chao Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) PDF PDF
Derrick Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) PDF PDF
Chao Reading Wikipedia to Answer Open-Domain Questions PDF PDF
Jennifer Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text PDF PDF

Presenter Papers Paper URL Our Slides
Bill Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples PDF PDF
Bill Adversarial Examples for Evaluating Reading Comprehension Systems, Robin Jia, Percy Liang PDF PDF
Bill Certified Defenses against Adversarial Examples, Aditi Raghunathan, Jacob Steinhardt, Percy Liang PDF PDF
Bill Provably Minimally-Distorted Adversarial Examples, Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill PDF PDF

Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  

Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  
---

[151]: set

Table of readings


Presenter Papers Paper URL Our Slides
seq2seq Sequence to Sequence Learning with Neural Networks PDF  
Set Pointer Networks PDF  
Set Order Matters: Sequence to Sequence for Sets PDF  
Point Attention Multiple Object Recognition with Visual Attention PDF  
Memory End-To-End Memory Networks PDF Jack Survey
Memory Neural Turing Machines PDF  
Memory Hybrid computing using a neural network with dynamic external memory PDF  
Muthu Matching Networks for One Shot Learning (NIPS16) 1 PDF PDF
Jack Meta-Learning with Memory-Augmented Neural Networks (ICML16) 2 PDF PDF
Metric ICML07 Best Paper - Information-Theoretic Metric Learning PDF  
---

[152]: shapley

Table of readings


Index Papers Our Slides
0 A survey on Interpreting Deep Learning Models Eli Survey
  Interpretable Machine Learning: Definitions,Methods, Applications Arsh Survey
1 Explaining Explanations: Axiomatic Feature Interactions for Deep Networks Arsh Survey
2 Shapley Value review Arsh Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Bill Survey
  Consistent Individualized Feature Attribution for Tree Ensembles bill Survey
  Summary for A value for n-person games Pan Survey
  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Rishab Survey
3 Hierarchical Interpretations of Neural Network Predictions Arsh Survey
  Hierarchical Interpretations of Neural Network Predictions Rishab Survey
4 Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Arsh Survey
  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs Rishab Survey
5 Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models Rishab Survey
    Sanchit Survey
  Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Sanchit Survey
6 This Looks Like That: Deep Learning for Interpretable Image Recognition Pan Survey
7 AllenNLP Interpret Rishab Survey
8 DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs Rishab Survey
9 How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations Rishab Survey
10 Attention is not Explanation Sanchit Survey
    Pan Survey
11 Axiomatic Attribution for Deep Networks Sanchit Survey
12 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Sanchit Survey
13 Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier Sanchit Survey
14 “Why Should I Trust You?”Explaining the Predictions of Any Classifier Yu Survey
15 INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE Pan Survey
---

[153]: sketch

Table of readings


Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[154]: small-data

Table of readings


Team INDEX Title & Link Tags Our Slide
T33 The High-Dimensional Geometry of Binary Neural Networks Quantization, binarization, scalable OurSlide
T34 Modern Neural Networks Generalize on Small Data Sets small-data, analysis, ensemble OurSlide
T4 Cognitive Scheduler for Heterogeneous High Performance Computing System system-application OurSlide
---

[155]: software-testing

Table of readings


Presenter Papers Paper URL Our Slides
Bill Adversarial Examples that Fool both Computer Vision and Time-Limited Humans PDF PDF
Bill Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Bill TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing PDF PDF
Bill Distilling the Knowledge in a Neural Network PDF PDF
Bill Defensive Distillation is Not Robust to Adversarial Examples PDF PDF
Bill Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow PDF PDF

Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  

Presenter Papers Paper URL Our Slides
GaoJi A few useful things to know about machine learning PDF PDF
GaoJi A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions PDF PDF
GaoJi Automated White-box Testing of Deep Learning Systems 1 PDF PDF
GaoJi Testing and Validating Machine Learning Classifiers by Metamorphic Testing 2 PDF PDF
GaoJi Software testing: a research travelogue (2000–2014) PDF PDF
---

[156]: sparsity

Table of readings


Presenter Papers Paper URL Our Slides
Shijia Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 1 PDF PDF
Ceyer Sequence Modeling via Segmentations, ICML17 2 PDF PDF
Arshdeep Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Jack Learning End-to-End Goal-Oriented Dialog, ICLR17 1 PDF PDF
Bargav Nonparametric Neural Networks, ICLR17 2 PDF PDF
Bargav Learning Structured Sparsity in Deep Neural Networks, NIPS16 3 PDF PDF
Arshdeep Learning the Number of Neurons in Deep Networks, NIPS16 4 PDF PDF

Presenter Papers Paper URL Our Slides
scalable Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] PDF  
data scalable Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] PDF 2014 + PDF  
Binary Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1    
Model Binary embeddings with structured hashed projections 1 PDF PDF
Model Deep Compression: Compressing Deep Neural Networks (ICLR 2016) 2 PDF PDF
---

[157]: structured

Table of readings


Team INDEX Title & Link Tags Our Slide  
T5 Deep Structured Prediction with Nonlinear Output Transformations   structured OurSlide
T12 Large Margin Deep Networks for Classification OurSlide large-margin  
T15 Wide Activation for Efficient and Accurate Image Super-Resolution CNN OurSlide  
T17 Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks RNN OurSlide  
T28 Processing of missing data by neural networks imputation OurSlide  
T27 Implicit Acceleration by Overparameterization analysis OurSlide  

Presenter Papers Paper URL Our Slides
Robust Adversarial Attacks on Graph Structured Data Pdf Faizan [PDF + GaoJi Pdf
Robust KDD’18 Adversarial Attacks on Neural Networks for Graph Data Pdf Faizan PDF + GaoJi Pdf
Robust Attacking Binarized Neural Networks Pdf Faizan PDF

Presenter Papers Paper URL Our Slides
Chao Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification PDF PDF
Jack FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning PDF PDF
BasicMLC Multi-Label Classification: An Overview PDF  
SPEN Structured Prediction Energy Networks PDF  
InfNet Learning Approximate Inference Networks for Structured Prediction PDF  
SPENMLC Deep Value Networks PDF  
Adversarial Semantic Segmentation using Adversarial Networks PDF  
EmbedMLC StarSpace: Embed All The Things! PDF  
deepMLC CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016 PDF  
deepMLC Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018 PDF  

Presenter Papers Paper URL Our Slides
Ceyer Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 1 PDF PDF
Beilun Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy 2 PDF PDF
Ji Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 3 PDF PDF
Xueying End-to-End Differentiable Adversarial Imitation Learning, ICML17 4 PDF PDF
  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17 PDF  
  FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 5 PDF  

Presenter Papers Paper URL Our Slides
ChaoJiang Courville - Generative Models II DLSS17Slide + video PDF
GaoJi Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 1 PDF + talk PDF
Arshdeep Composing graphical models with neural networks for structured representations and fast inference, NIPS16 2 PDF PDF
  Johnson - Graphical Models and Deep Learning DLSSSlide + video  
  Parallel Multiscale Autoregressive Density Estimation, ICML17 3 PDF  
Beilun Conditional Image Generation with Pixel CNN Decoders, NIPS16 4 PDF PDF
Shijia Marrying Graphical Models & Deep Learning DLSS17 + Video PDF

Presenter Papers Paper URL Our Slides
Anant AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17 1 PDF PDF
Shijia SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, ICML17 2 PDF PDF
Jack Proximal Deep Structured Models, NIPS16 3 PDF PDF
  Optimal Architectures in a Solvable Model of Deep Networks, NIPS16 4 PDF  
Tianlu Large-Scale Evolution of Image Classifiers, ICML17 5 PDF PDF

Presenter Papers Paper URL Our Slides
Rita Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 1 PDF PDF
Tianlu Dynamic Coattention Networks For Question Answering, ICLR17 2 PDF + code PDF
ChaoJiang Structured Attention Networks, ICLR17 3 PDF + code PDF

Presenter Papers Paper URL Our Slides
Shijia Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 1 PDF PDF
Ceyer Sequence Modeling via Segmentations, ICML17 2 PDF PDF
Arshdeep Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 3 PDF PDF

Presenter Papers Paper URL Our Slides
Jack Learning End-to-End Goal-Oriented Dialog, ICLR17 1 PDF PDF
Bargav Nonparametric Neural Networks, ICLR17 2 PDF PDF
Bargav Learning Structured Sparsity in Deep Neural Networks, NIPS16 3 PDF PDF
Arshdeep Learning the Number of Neurons in Deep Networks, NIPS16 4 PDF PDF
---

[158]: stylometric

Table of readings


Presenter Papers Paper URL Our Slides
QA A Comparison of Current Graph Database Models Pdf + PDF2 Bill PDF
QA Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text Pdf Bill [PDF + GaoJi Pdf
QA Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan Pdf Bill PDF + GaoJi Pdf
QA Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix PDF + Pdf Bill PDF + GaoJi Pdf
Text Adversarial Text Generation via Feature-Mover’s Distance URL Faizan PDF
Text Content preserving text generation with attribute controls URL Faizan PDF
Text Multiple-Attribute Text Rewriting, ICLR, 2019, URL Faizan PDF
Text Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace URL Faizan PDF
---

[159]: submodular

Table of readings

---

[160]: subspace

Table of readings

---

[161]: temporal-difference

Table of readings


Presenter Papers Paper URL Our Slides
Anant The Predictron: End-to-End Learning and Planning, ICLR17 1 PDF PDF
ChaoJiang Szepesvari - Theory of RL 2 RLSS.pdf + Video PDF
GaoJi Mastering the game of Go without human knowledge / Nature 2017 3 PDF PDF
  Thomas - Safe Reinforcement Learning RLSS17.pdf + video  
  Sutton - Temporal-Difference Learning RLSS17.pdf + Video  
---

[162]: text

Table of readings


Presenter Papers Paper URL Our Slides
NLP A Neural Probabilistic Language Model PDF  
Text Bag of Tricks for Efficient Text Classification PDF  
Text Character-level Convolutional Networks for Text Classification PDF  
NLP BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding PDF  
seq2seq Neural Machine Translation by Jointly Learning to Align and Translate PDF  
NLP Natural Language Processing (almost) from Scratch PDF  
Train Curriculum learning PDF  
Muthu NeuroIPS Embedding Papers survey 2012 to 2015 NIPS PDF
Basics Efficient BackProp PDF  
---

[163]: training

Table of readings

---

[164]: transfer

Table of readings


Team INDEX Title & Link Tags Our Slide
T11 Parameter-Efficient Transfer Learning for NLP meta, BERT, text, Transfer OurSlide
T22 Deep Asymmetric Multi-task Feature Learning meta, regularization, Multi-task OurSlide
---

[165]: transfer-learning

Table of readings


Index Papers Our Slides
1 Invariant Risk Minimization Zhe Survey
2 Causal Machine Learning Zhe Survey
3 A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms Zhe Survey
3 Review on Optimization-Based Meta Learning Zhe Survey
4 Domain adaptation and counterfactual prediction Zhe Survey
5 Gaussian Processes Zhe Survey
6 A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data Zhe Survey
7 Few-shot domain adaptation by causal mechanism transfer Zhe Survey

Presenter Papers Paper URL Our Slides
Jack Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 1 PDF PDF
Arshdeep Bidirectional Attention Flow for Machine Comprehension, ICLR17 2 PDF + code PDF
Ceyer Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17 PDF + code PDF
ChaoJiang Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 3 PDF PDF
  An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17 PDF  

Presenter Papers Paper URL Our Slides
Rita Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 1 PDF PDF
Tianlu Dynamic Coattention Networks For Question Answering, ICLR17 2 PDF + code PDF
ChaoJiang Structured Attention Networks, ICLR17 3 PDF + code PDF
---

[166]: trees

Table of readings


Presenter Papers Paper URL Our Slides
Derrick GloVe: Global Vectors for Word Representation PDF PDF
Derrick PARL.AI: A unified platform for sharing, training and evaluating dialog models across many tasks. URL PDF
Derrick scalable nearest neighbor algorithms for high dimensional data (PAMI14) 1 PDF PDF
Derrick StarSpace: Embed All The Things! PDF PDF
Derrick Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading, Martin Raison, Pierre-Emmanuel Mazaré, Rajarshi Das, Antoine Bordes PDF PDF
---

[167]: tutorial

Table of readings


Type Papers Paper URL Our Slides
Dr Qi Survey of 10 DeepLearning (DL) trends different from classic machine learning   OurSlide
Youtube Generative DL Basics Youtube1 + Youtube2 NA
Youtube Computation Graph for DL (pytorch vs. tensorflow Youtube URL + Youtube2 NA
Youtube Auto Differentiation for DL Youtube1+ Youtube2 NA
Youtube RL basics and DL-RL basics Youtube1 + Youtube2 NA
Youtube Probabilistic programming and in DL Pyro Youtube1 + Youtube2 NA
Youtube Basics of Software Testing for DL Youtube URL NA
Course Bill_CNN_Ng_Lecture_Notes   Bill’s Notes
Course Bill_caltechMLnotes_ALL   Bill’s Notes
classic Paper The Lottery Ticket Hypothesis   Morris’ Notes
classic Paper NLP From Scratch   Morris’ Notes
classic Paper Statistical Modeling The Two Cultures   Morris’ Notes
classic Paper Attention_is_All_You_Need   Eli’ Notes
classic Paper YOLO   Eli’ Notes
classic Paper Neural Turing Machine   Jake Survey
classic Paper BERT (Bidirectional Encoder Representation for Transformers): Pretraining of Deep Bidirectional Transformers for Language Understanding   Rishab Survey

Presenter Papers Paper URL Our Slides
Dr Qi Survey of Recent DeepLearning to 12 Groups / PDF    

Presenter Papers Paper URL Our Slides
Dr. Qi Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation   PDF

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin, NIPS2017 / Ritambhara Singh, Jack Lanchantin, Arshdeep Sekhon, Yanjun Qi

The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together? Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; we call AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map. Code and data are shared atwww.deepchrome.org

---

[168]: understanding

Table of readings


Presenter Papers Paper URL Our Slides
Jack A Unified Approach to Interpreting Model Predictions PDF PDF
Jack “Why Should I Trust You?”: Explaining the Predictions of Any Classifier PDF PDF
Jack Visual Feature Attribution using Wasserstein GANs PDF PDF
Jack GAN Dissection: Visualizing and Understanding Generative Adversarial Networks PDF PDF
GaoJi Recent Interpretable machine learning papers PDF PDF
Jennifer The Building Blocks of Interpretability PDF PDF

Presenter Papers Paper URL Our Slides
SE Equivariance Through Parameter-Sharing, ICML17 1 PDF  
SE Why Deep Neural Networks for Function Approximation?, ICLR17 2 PDF  
SE Geometry of Neural Network Loss Surfaces via Random Matrix Theory, 3ICML17 PDF  
  Sharp Minima Can Generalize For Deep Nets, ICML17 4 PDF  

Presenter Papers Paper URL Our Slides
Ceyer A Closer Look at Memorization in Deep Networks, ICML17 1 PDF PDF
  On the Expressive Efficiency of Overlapping Architectures of Deep Learning 2 DLSSpdf + video  
Mutual Information Opening the Black Box of Deep Neural Networks via Information 3 URL + video  
ChaoJiang Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16 PDF PDF

Presenter Papers Paper URL Our Slides
Beilun Learning Deep Parsimonious Representations, NIPS16 1 PDF PDF
Jack Dense Associative Memory for Pattern Recognition, NIPS16 2 PDF + video PDF

Presenter Papers Paper URL Our Slides
Rita On the Expressive Power of Deep Neural Networks 1 PDF PDF
Arshdeep Understanding deep learning requires rethinking generalization, ICLR17 2 PDF PDF
Tianlu On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR17 3 PDF PDF
---

[169]: vae

Table of readings


Index Papers Our Slides
1 Beta VAE, Ladder VAE, Causal VAE Arsh Survey
2 Learnt Prior VAE Arsh Survey
3 Multitask Graph Autoencoder Arsh Survey
4 Introduction to component analysi Zhe Survey
5 Normalizing flow Zhe Survey
6 Nonlinear ICA Zhe Survey
7 Deep Convolutional Inverse Graphics Network Zhe Survey

Team INDEX Title & Link Tags Our Slide
T14 CAN: Creative Adversarial Networks Generating “Art” GAN OurSlide
T26 Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation encoder-decoder, dialog, VAE, Interpretable OurSlide
T32 Which Training Methods for GANs do actually Converge convergence, optimization, GAN OurSlide
---

[170]: value-networks

Table of readings


Presenter Papers Paper URL Our Slides
Ceyer Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 1 PDF PDF
Beilun Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy 2 PDF PDF
Ji Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 3 PDF PDF
Xueying End-to-End Differentiable Adversarial Imitation Learning, ICML17 4 PDF PDF
  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17 PDF  
  FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 5 PDF  
---

[171]: variational

Table of readings


Presenter Papers Paper URL Our Slides
Generate Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF Tkach PDF + GaoJi Pdf
Generate Graphical Generative Adversarial Networks PDF Arshdeep PDF
Generate GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 PDF Arshdeep PDF
Generate Inference in probabilistic graphical models by Graph Neural Networks PDF Arshdeep PDF
Generate Encoding robust representation for graph generation Pdf Arshdeep PDF
Generate Junction Tree Variational Autoencoder for Molecular Graph Generation Pdf Tkach PDF + Arshdeep Pdf
Generate Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18   Tkach PDF
Generate Towards Variational Generation of Small Graphs Pdf Tkach PDF + Arshdeep Pdf
Generate Convolutional Imputation of Matrix Networks Pdf Tkach PDF
Generate Graph Convolutional Matrix Completion Pdf Tkach PDF
Generate NetGAN: Generating Graphs via Random Walks ICML18 [ULR] Tkach PDF
Beam Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement URL Tkach PDF

Presenter Papers Paper URL Our Slides
Tkach Boundary-Seeking Generative Adversarial Networks PDF PDF
Tkach Maximum-Likelihood Augmented Discrete Generative Adversarial Networks PDF PDF
Tkach Generating Sentences from a Continuous Space PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Constrained Graph Variational Autoencoders for Molecule Design PDF PDF
Arshdeep Learning Deep Generative Models of Graphs PDF PDF
Arshdeep Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation PDF PDF
Jack Generating and designing DNA with deep generative models PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh 1 PDF PDF
GaoJi Summary Of Several Autoencoder models PDF PDF
GaoJi Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts 2 PDF PDF
GaoJi Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN PDF PDF
Arshdeep Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush 3 PDF PDF
Arshdeep Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals 4 PDF PDF

Presenter Papers Paper URL Our Slides
Arshdeep Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 1 PDF PDF
Arshdeep Latent Alignment and Variational Attention 2 PDF PDF
Arshdeep Modularity Matters: Learning Invariant Relational Reasoning Tasks, Jason Jo, Vikas Verma, Yoshua Bengio 3 PDF PDF
---

[172]: verification

Table of readings


Team INDEX Title & Link Tags Our Slide
T1 Safe Reinforcement Learning via Shielding RL, safety, verification OurSlide

Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  
---

[173]: visualizing

Table of readings


Presenter Papers Paper URL Our Slides
Bio KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 1 Pdf Eli Pdf
Bio Molecular geometry prediction using a deep generative graph neural network Pdf Eli Pdf
Bio Visualizing convolutional neural network protein-ligand scoring PDF() Eli PDF
Bio Deep generative models of genetic variation capture mutation effects PDF() Eli PDF
Bio Attentive cross-modal paratope prediction Pdf Eli PDF

Presenter Papers Paper URL Our Slides
Jennifer Adversarial Attacks Against Medical Deep Learning Systems PDF PDF
Jennifer Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning PDF PDF
Jennifer Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers PDF PDF
Jennifer CleverHans PDF PDF
Ji Ji-f18-New papers about adversarial attack   PDF

Presenter Papers Paper URL Our Slides
Jack A Unified Approach to Interpreting Model Predictions PDF PDF
Jack “Why Should I Trust You?”: Explaining the Predictions of Any Classifier PDF PDF
Jack Visual Feature Attribution using Wasserstein GANs PDF PDF
Jack GAN Dissection: Visualizing and Understanding Generative Adversarial Networks PDF PDF
GaoJi Recent Interpretable machine learning papers PDF PDF
Jennifer The Building Blocks of Interpretability PDF PDF

Presenter Papers Paper URL Our Slides
Rita Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17 1 PDF PDF
Arshdeep Axiomatic Attribution for Deep Networks, ICML17 2 PDF PDF
  The Robustness of Estimator Composition, NIPS16 PDF  

Ganguli - Theoretical Neuroscience and Deep Learning

Presenter Papers Paper URL Our Slides
DLSS16 video    
DLSS17 video + slide    
DLSS17 Deep learning in the brain DLSS17 + Video  

Presenter Papers Paper URL Our Slides
AE Intriguing properties of neural networks / PDF  
AE Explaining and Harnessing Adversarial Examples PDF  
AE Towards Deep Learning Models Resistant to Adversarial Attacks PDF  
AE DeepFool: a simple and accurate method to fool deep neural networks PDF  
AE Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner PDF PDF
Data Basic Survey of ImageNet - LSVRC competition URL PDF
Understand Understanding Black-box Predictions via Influence Functions PDF  
Understand Deep inside convolutional networks: Visualising image classification models and saliency maps PDF  
Understand BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1] PDF  
provable Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter, URL  
---

[174]: white-box

Table of readings


Presenter Papers Paper URL Our Slides
GaoJi Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh PDF PDF
GaoJi Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer PDF PDF
GaoJi DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray PDF PDF
GaoJi A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors 1 PDF PDF
GaoJi A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) PDF PDF
Testing DeepXplore: Automated Whitebox Testing of Deep Learning Systems PDF  

Presenter Papers Paper URL Our Slides
GaoJi A few useful things to know about machine learning PDF PDF
GaoJi A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions PDF PDF
GaoJi Automated White-box Testing of Deep Learning Systems 1 PDF PDF
GaoJi Testing and Validating Machine Learning Classifiers by Metamorphic Testing 2 PDF PDF
GaoJi Software testing: a research travelogue (2000–2014) PDF PDF
--- ---