Readings ByReadDate

Click on a term-tag to see relevant list of readings we finished in a certain semester.

2024-seminarRead

No.	Date	Title and Information	PaperYear
1	2024, May, 3	Safety Benchmark WMDP	2024-S27
2	2024, Apr, 30	KV Cache and Tooling	2024-S27
3	2024, Apr, 25	Advanced Transformer Architectures	2024-S26
4	2024, Apr, 23	LLM fine tuning	2024-S25
5	2024, Apr, 18	Recent LLM basics	2024-S24
6	2024, Apr, 16	MultiAgent LLMs	2024-S23
7	2024, Apr, 11	LLM Agents	2024-S22
8	2024, Apr, 9	Self-exam LLM and reasoning	2024-S21
9	2024, Apr, 4	Prompt Engineering	2024-S20
10	2024, Apr, 2	LLM Scaling law and Efficiency	2024-S19
11	2024, Mar, 28	LLM interpretibility, trust and knowledge conflicts	2024-S18
12	2024, Mar, 26	Model editing and Disgorgement	2024-S17
13	2024, Mar, 21	Domain Centered FMs	2024-S16
14	2024, Mar, 19	LLM Hallucination	2024-S15
15	2024, Mar, 14	Knowledge Augmented FMs	2024-S14
16	2024, Mar, 12	More FM risk	2024-S13
17	2024, Feb, 29	LLM multimodal harm responses	2024-S12
18	2024, Feb, 27	FM toxicity / harmful outputs	2024-S11
19	2024, Feb, 22	FM fairness / bias issues	2024-S10
20	2024, Feb, 20	FM privacy leakage issues	2024-S9
21	2024, Feb, 15	FM copyright infrigement	2024-S8
22	2024, Feb, 13	Survey AI Risk framework	2024-S7
23	2024, Feb, 8	Open Source LLM - Mistral Data preparation	2024-S6
24	2024, Feb, 6	Survey human alignment	2024-S5
25	2024, Feb, 1	GenAI Guardrails	2024-S4
26	2024, Jan, 30	LLM evaluating framework	2024-S3
27	2024, Jan, 25	Survey LLMs and Multimodal FMs	2024-S2
28	2024, Jan, 23	LLM basics	2024-S1
29	2024, Jan, 18	NLP Basics Introduction	2024-S0

[1]: Safety Benchmark WMDP

read on: - 03 May 2024
FMRisk Safety Mitigate LLMEvaluate Adversarial

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 4,157 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop CUT, a state-of-the-art unlearning method based on controlling model representations. CUT reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at this https URL

[2]: KV Cache and Tooling

read on: - 30 Apr 2024
FMEfficient Efficiency

KV Caching in LLM:

grouped query attention: https://arxiv.org/pdf/2305.13245.pdf
Paged attention https://arxiv.org/pdf/2309.06180.pdf https://openreview.net/pdf?id=uNrFpDPMyo

Must know tools for training/finetuning/serving LLM’s -

Torchtune - Build on top of Pytorch, for training and finetuning LLM’s. Uses yaml based configs for easily running experiments. Github -
axolotl - Built on top on Huggigface peft and transformer library, supports fine-tuning a large number for models like Mistral, LLama etc. Provides support for techniques like RLHF, DPO, LORA, qLORA etc. Github
LitGPT - Build on nanoGPT and Megatron, support pre-training and fine-tuning, has examples like Starcoder, TinyLlama etc. Github -
Maxtext - Jax based library for training LLM’s on Google TPU’s with configs for models like Gemma, Mistral and LLama2 etc. Github
Langchain- https://python.langchain.com/docs/get_started/introduction
haystack.deepset.ai
- https://github.com/deepset-ai/haystack
- LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it’s best suited for building RAG, question answering, semantic search or conversational agent chatbots.
LlamaIndex
- https://docs.llamaindex.ai/en/stable/ LlamaIndex supports Retrieval-Augmented Generation (RAG). Instead of asking LLM to generate an answer immediately, LlamaIndex: retrieves information from your data sources first, / adds it to your question as context, and / asks the LLM to answer based on the enriched prompt.
Making Retrieval Augmented Generation Fast
- https://www.pinecone.io/learn/fast-retrieval-augmented-generation/
OpenMoE
- https://github.com/XueFuzhao/OpenMoE

[3]: Advanced Transformer Architectures

read on: - 25 Apr 2024
FMEfficient Efficiency

In this session, our readings cover:

Required Readings:

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

https://arxiv.org/abs/2311.12351
Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current LLMs are predominantly pretrained on short text snippets, which compromises their effectiveness in processing the long-context prompts that are frequently encountered in practical scenarios. This article offers a comprehensive survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs throughout the entire model lifecycle, from pre-training through to inference. We first delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. We then provide a taxonomy and the landscape of upgrades on Transformer architecture to solve these problems. Afterwards, we provide an investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as optimization toolkits such as libraries, frameworks, and compilers to boost the efficacy of LLMs across different stages in runtime. Finally, we discuss the challenges and potential avenues for future research. A curated repository of relevant literature, continuously updated, is available at this https URL.

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
Paper: https://arxiv.org/abs/2205.14135
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware – accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3$\times$ speedup on GPT-2 (seq. length 1K), and 2.4$\times$ speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).
Related: blogpost FlashAttention — Techniques for Efficient Inference of LLMs (III/IV)

JAMBA

Introducing Jamba: AI21’s Groundbreaking SSM-Transformer Model Debuting the first production-grade Mamba-based model delivering best-in-class quality and performance.
March 28, 2024
https://www.ai21.com/blog/announcing-jamba
We are thrilled to announce Jamba, the world’s first production-grade Mamba based model. By enhancing Mamba Structured State Space model (SSM) technology with elements of the traditional Transformer architecture, Jamba compensates for the inherent limitations of a pure SSM model. Offering a 256K context window, it is already demonstrating remarkable gains in throughput and efficiency—just the beginning of what can be possible with this innovative hybrid architecture. Notably, Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.

[4]: LLM fine tuning

read on: - 23 Apr 2024
FMAdapt Alignment

In this session, our readings cover:

Required Readings:

Recent Large Language Models Reshaping the Open-Source Arena

https://deci.ai/blog/list-of-large-language-models-in-open-source/
The release of Meta’s Llama model and the subsequent release of Llama 2 in 2023 kickstarted an explosion of open-source language models, with better and more innovative models being released on what seems like a daily basis. With new open-source models being released on a daily basis, here we dove into the ocean of open-source possibilities to curate a select list of the most intriguing and influential models making waves in recent months, inlcuding Qwen1.5/ Yi/ Smaug/ Mixtral-8x7B-v0.1/ DBRX/ SOLAR-10.7B-v1.0 / Tulu 2 / WizardLM/ Starling 7B/ OLMo-7B/ Gemma and DeciLM-7B.
Plus the newly avaiable DBRX model https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

Instruction Tuning for Large Language Models: A Survey

https://arxiv.org/abs/2308.10792
Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users’ objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research. Project page: this http URL

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models

https://arxiv.org/abs/2203.06904
Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divide existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning.

[5]: Recent LLM basics

read on: - 18 Apr 2024
FMEfficient Efficiency BasicLLM

In this session, our readings cover:

Require Readings:

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

https://arxiv.org/abs/2312.15234
In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

https://arxiv.org/abs/2304.01373
How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at \url{this https URL}.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

https://arxiv.org/abs/2403.09611
Multimodal LLM Pre-training - provides a comprehensive overview of methods, analysis, and insights into multimodal LLM pre-training; studies different architecture components and finds that carefully mixing image-caption, interleaved image-text, and text-only data is key for state-of-the-art performance; it also proposes a family of multimodal models up to 30B parameters that achieve SOTA in pre-training metrics and include properties such as enhanced in-context learning, multi-image reasoning, enabling few-shot chain-of-thought prompting.

[6]: MultiAgent LLMs

read on: - 16 Apr 2024
FMAdapt Agent

In this session, our readings cover:

Required Readings:

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents’ capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.

Required Readings:

A Survey on Large Language Model based Autonomous Agents

https://arxiv.org/abs/2308.11432
Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective. More specifically, we first discuss the construction of LLM-based autonomous agents, for which we propose a unified framework that encompasses a majority of the previous work. Then, we present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository of relevant references at this https URL.

[8]: Self-exam LLM and reasoning

read on: - 09 Apr 2024
FMAdapt Reasoning

In this session, our readings cover:

Required Readings:

Augmented Language Models: a Survey

Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability,

Self-Consistency Improves Chain of Thought Reasoning in Language Models

https://arxiv.org/abs/2203.11171
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

https://arxiv.org/abs/2401.00812
Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi R. Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, Chengxiang Zhai
The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs’ training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code.

[9]: Prompt Engineering

read on: - 04 Apr 2024
FMAdapt Prompting

In this session, our readings cover:

Required Readings:

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

https://arxiv.org/abs/2310.14735
Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu / This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting, as well as more advanced methodologies such as the chain-of-thought and tree-of-thoughts prompting. The paper sheds light on how external assistance in the form of plugins can assist in this task, and reduce machine hallucination by retrieving external knowledge. We subsequently delineate prospective directions in prompt engineering research, emphasizing the need for a deeper understanding of structures and the role of agents in Artificial Intelligence-Generated Content (AIGC) tools. We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods. Finally, we gather information about the application of prompt engineering in such fields as education and programming, showing its transformative potential. This comprehensive survey aims to serve as a friendly guide for anyone venturing through the big world of LLMs and prompt engineering.

[10]: LLM Scaling law and Efficiency

read on: - 02 Apr 2024
FMEfficient Efficiency

In this session, our readings cover:

Required Readings:

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
https://github.com/RUCAIBox/LLMSurvey

Efficient Large Language Models: A Survey

https://arxiv.org/abs/2312.03863
https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey
Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency this http URL this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we compile the papers featured in this survey at this https URL, and will actively maintain this repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Recent research, such as BitNet [23], is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

[11]: LLM interpretibility, trust and knowledge conflicts

read on: - 28 Mar 2024
FMRisk Interpretibility

Required Readings:

Rethinking interpretability in the era of large language models

Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao
2024/1/30
Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations.

The Claude 3 Model Family: Opus, Sonnet, Haiku

https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf
We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model. All new models have vision capabilities that enable them to process and analyze image data. The Claude 3 family demonstrates strong performance across benchmark evaluations and sets a new standard on measures of reasoning, math, and coding. Claude 3 Opus achieves state-of-the-art results on evaluations like GPQA [1], MMLU [2], MMMU [3] and many more. Claude 3 Haiku performs as well or better than Claude 2 [4] on most pure-text tasks, while Sonnet and Opus significantly outperform it. Additionally, these models exhibit improved fluency in non-English languages, making them more versatile for a global audience. In this report, we provide an in-depth analysis of our evaluations, focusing on core capabilities, safety, societal impacts, and the catastrophic risk assessments we committed to in our Responsible Scaling Policy [5].

[12]: Model editing and Disgorgement

read on: - 26 Mar 2024
FMAdapt ModelEdit

In this session, our readings cover:

Required Readings:

Editing Large Language Models: Problems, Methods, and Opportunities

https://arxiv.org/abs/2305.13172
Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang Despite the ability to train capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To this end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to efficiently alter the behavior of LLMs within a specific domain without negatively impacting performance across other inputs. This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. In particular, we provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. We also build a new benchmark dataset to facilitate a more robust evaluation and pinpoint enduring issues intrinsic to existing techniques. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context. Code and datasets are available at this https URL. Comments: EMNLP 2023. Updated with new experiments

[13]: Domain Centered FMs

read on: - 21 Mar 2024
FMAdapt DomainAdapt

In this session, our readings cover:

Required Readings:

Large Language Models for Software Engineering: A Systematic Literature Review

Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We collect and analyze 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study.

[14]: LLM Hallucination

read on: - 19 Mar 2024
FMRisk Hallucination

In this session, our readings cover:

Required Readings:

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

https://arxiv.org/abs/2311.05232
The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of LLMs in real-world scenarios, which attracts increasing attention to detect and mitigate these hallucinations. In this survey, we aim to provide a thorough and in-depth overview of recent advances in the field of LLM hallucinations. We begin with an innovative taxonomy of LLM hallucinations, then delve into the factors contributing to hallucinations. Subsequently, we present a comprehensive overview of hallucination detection methods and benchmarks. Additionally, representative approaches designed to mitigate hallucinations are introduced accordingly. Finally, we analyze the challenges that highlight the current limitations and formulate open questions, aiming to delineate pathways for future research on hallucinations in LLMs.

[15]: Knowledge Augmented FMs

read on: - 14 Mar 2024
FMAdapt RAG

In this session, our readings cover:

Required Readings:

Retrieval-Augmented Generation for AI-Generated Content: A Survey

https://arxiv.org/abs/2402.19473v1
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, scalable foundation model architectures, and the availability of ample high-quality datasets. While AIGC has achieved remarkable performance, it still faces challenges, such as the difficulty of maintaining up-to-date and long-tail knowledge, the risk of data leakage, and the high costs associated with training and inference. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances AIGC results by retrieving relevant objects from available data stores, leading to greater accuracy and robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator. We distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Project: this https URL

Retrieval-Augmented Generation for Large Language Models: A Survey

https://arxiv.org/abs/2312.10997
Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in practical applications, such as hallucinations, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from external knowledge bases before answering questions with LLMs. RAG has been demonstrated to significantly enhance answer accuracy, reduce model hallucination, particularly for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model outputs. It also facilitates knowledge updates and the introduction of domain-specific knowledge. RAG effectively combines the parameterized knowledge of LLMs with non-parameterized external knowledge bases, making it one of the most important methods for implementing large language models. This paper outlines the development paradigms of RAG in the era of LLMs, summarizing three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It then provides a summary and organization of the three main components of RAG: retriever, generator, and augmentation methods, along with key technologies in each component. Furthermore, it discusses how to evaluate the effectiveness of RAG models, introducing two evaluation methods for RAG, emphasizing key metrics and abilities for evaluation, and presenting the latest automatic evaluation framework. Finally, potential future research directions are introduced from three aspects: vertical optimization, horizontal scalability, and the technical stack and ecosystem of RAG.

Even More

A Survey of Table Reasoning with Large Language Models

Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, Wanxiang Che
https://arxiv.org/abs/2402.08259
Table reasoning, which aims to generate the corresponding answer to the question following the user requirement according to the provided table, and optionally a text description of the table, effectively improving the efficiency of obtaining information. Recently, using Large Language Models (LLMs) has become the mainstream method for table reasoning, because it not only significantly reduces the annotation cost but also exceeds the performance of previous methods. However, existing research still lacks a summary of LLM-based table reasoning works. Due to the existing lack of research, questions about which techniques can improve table reasoning performance in the era of LLMs, why LLMs excel at table reasoning, and how to enhance table reasoning abilities in the future, remain largely unexplored. This gap significantly limits progress in research. To answer the above questions and advance table reasoning research with LLMs, we present this survey to analyze existing research, inspiring future work. In this paper, we analyze the mainstream techniques used to improve table reasoning performance in the LLM era, and the advantages of LLMs compared to pre-LLMs for solving table reasoning. We provide research directions from both the improvement of existing methods and the expansion of practical applications to inspire future research.

[16]: More FM risk

read on: - 12 Mar 2024
FMRisk Safety Adversarial

In this session, our readings cover:

Required Readings:

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

https://dl.acm.org/doi/10.1145/3442188.3445922
The past 3 years of work in NLP have been characterized by the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries of the possible both through architectural innovations and through sheer size. Using these pretrained models and the methodology of fine-tuning them for specific tasks, researchers have extended the state of the art on a wide array of tasks as measured by leaderboards on specific benchmarks for English. In this paper, we take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models.

Even More

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation / EMNLP2023

Despite remarkable advances that large language models have achieved in chatbots nowadays, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media contents, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark constructed based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference when compared to social media contents. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.

OpenAI on LLM generated bio-x-risk

Building an early warning system for LLM-aided biological threat creation
https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation

A misleading open letter about sci-fi AI dangers ignores the real risks

https://www.aisnakeoil.com/p/a-misleading-open-letter-about-sci

https://deepmind.google/discover/blog/evaluating-social-and-ethical-risks-from-generative-ai/

Managing Existential Risk from AI without Undercutting Innovation

https://www.csis.org/analysis/managing-existential-risk-ai-without-undercutting-innovation

[17]: LLM multimodal harm responses

read on: - 29 Feb 2024
FMRisk Safety Adversarial

In this session, our readings cover:

Required Readings:

Dingcheng Yang, Yang Bai, Xiaojun Jia, Yang Liu, Xiaochun Cao, Wenjian Yu
Diffusion models have been widely deployed in various image generation tasks, demonstrating an extraordinary connection between image and text modalities. However, they face challenges of being maliciously exploited to generate harmful or sensitive images by appending a specific suffix to the original prompt. Existing works mainly focus on using single-modal information to conduct attacks, which fails to utilize multi-modal features and results in less than satisfactory performance. Integrating multi-modal priors (MMP), i.e. both text and image features, we propose a targeted attack method named MMP-Attack in this work. Specifically, the goal of MMP-Attack is to add a target object into the image content while simultaneously removing the original object. The MMP-Attack shows a notable advantage over existing works with superior universality and transferability, which can effectively attack commercial text-to-image (T2I) models such as DALL-E 3. To the best of our knowledge, this marks the first successful attempt of transfer-based attack to commercial T2I models. Our code is publicly available at ….

A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

https://ieeexplore.ieee.org/document/10208563
Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem ‘query-free attack generation’. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content.

[18]: FM toxicity / harmful outputs

read on: - 27 Feb 2024
FMRisk Safety Adversarial

In this session, our readings cover:

Required Readings:

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

https://arxiv.org/abs/2402.04249
Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at this https URL.

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training
Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

[19]: FM fairness / bias issues

read on: - 22 Feb 2024
FMRisk Bias Adversarial

In this session, our readings cover:

Required Readings:

Evaluating and Mitigating Discrimination in Language Model Decisions

https://arxiv.org/abs/2312.03689
As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks. We present a method for proactively evaluating the potential discriminatory impact of LMs in a wide range of use cases, including hypothetical use cases where they have not yet been deployed. Specifically, we use an LM to generate a wide array of potential prompts that decision-makers may input into an LM, spanning 70 diverse decision scenarios across society, and systematically vary the demographic information in each prompt. Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied. While we do not endorse or permit the use of language models to make automated decisions for the high-risk use cases we study, we demonstrate techniques to significantly decrease both positive and negative discrimination through careful prompt engineering, providing pathways toward safer deployment in use cases where they may be appropriate. Our work enables developers and policymakers to anticipate, measure, and address discrimination as language model capabilities and applications continue to expand. We release our dataset and prompts at this https URL

[20]: FM privacy leakage issues

read on: - 20 Feb 2024
FMRisk Mitigate LLMEvaluate Adversarial

In this session, our readings cover:

Required Readings:

Are Large Pre-Trained Language Models Leaking Your Personal Information?

https://arxiv.org/abs/2205.12628
Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner’s name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.

Privacy Risks of General-Purpose Language Models

https://ieeexplore.ieee.org/abstract/document/9152761
We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, different…

[21]: FM copyright infrigement

read on: - 15 Feb 2024
FMRisk Mitigate LLMEvaluate Adversarial

In this session, our readings cover:

Required Readings:

Foundation Models and Fair Use

Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, Percy Liang
URL
Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.

Extracting Training Data from Diffusion Models

Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

https://arxiv.org/abs/2303.04226
Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.

[22]: Survey AI Risk framework

read on: - 13 Feb 2024
FMRisk Mitigate LLMEvaluate Adversarial

In this session, our readings cover:

Required Readings:

TrustLLM: Trustworthiness in Large Language Models

https://arxiv.org/abs/2401.05561
Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into “The Good” (beneficial LLM applications), “The Bad” (offensive applications), and “The Ugly” (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs’ potential to both bolster and jeopardize cybersecurity
https://arxiv.org/abs/2312.02003

[23]: Open Source LLM - Mistral Data preparation

read on: - 08 Feb 2024
FMBasic BasicLLM

In this session, our readings cover:

Required Readings:

Mistral 7B

https://mistral.ai/news/announcing-mistral-7b/
We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B – Instruct, that surpasses the Llama 2 13B – Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

[24]: Survey human alignment

read on: - 06 Feb 2024
FMBasic Alignment

In this session, our readings cover:

Required Readings:

Aligning Large Language Models with Human: A Survey

https://arxiv.org/abs/2307.12966
https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo
https://huggingface.co/blog/stackllama

[25]: GenAI Guardrails

read on: - 01 Feb 2024
FMRisk Mitigate Adversarial

In this session, our readings cover:

Required Readings:

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

https://arxiv.org/abs/2312.06674
We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to as response classification. For the purpose of both prompt and response classification, we have meticulously gathered a dataset of high quality. Llama Guard, a Llama2-7b model that is instruction-tuned on our collected dataset, albeit low in volume, demonstrates strong performance on existing benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat, where its performance matches or exceeds that of currently available content moderation tools. Llama Guard functions as a language model, carrying out multi-class classification and generating binary decision scores. Furthermore, the instruction fine-tuning of Llama Guard allows for the customization of tasks and the adaptation of output formats. This feature enhances the model’s capabilities, such as enabling the adjustment of taxonomy categories to align with specific use cases, and facilitating zero-shot or few-shot prompting with diverse taxonomies at the input. We are making Llama Guard model weights available and we encourage researchers to further develop and adapt them to meet the evolving needs of the community for AI safety.

[26]: LLM evaluating framework

read on: - 30 Jan 2024
FMBasic LLMEvaluate

In this session, our readings cover:

Required Readings:

Holistic Evaluation of Text-To-Image Models

https://arxiv.org/abs/2311.04287
The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at this https URL and the code at this https URL, which is integrated with the HELM codebase.

Holistic Evaluation of Language Models

https://arxiv.org/abs/2211.09110

[27]: Survey LLMs and Multimodal FMs

read on: - 25 Jan 2024
FMMulti BasicLLM

In this session, our readings cover:

Readings:

ChatGPT is not all you need. A State of the Art Review of large Generative AI models

Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchan
https://arxiv.org/abs/2301.04655
During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as being a general question and answering system or automatically creating artistic images that are revolutionizing several sectors. Consequently, the implications that these generative models have in the industry and society are enormous, as several job positions may be transformed. For example, Generative AI is capable of transforming effectively and creatively texts to images, like the DALLE-2 model; text to 3D images, like the Dreamfusion model; images to text, like the Flamingo model; texts to video, like the Phenaki model; texts to audio, like the AudioLM model; texts to other texts, like ChatGPT; texts to code, like the Codex model; texts to scientific texts, like the Galactica model or even create algorithms like AlphaTensor. This work consists on an attempt to describe in a concise way the main models are sectors that are affected by generative AI and to provide a taxonomy of the main generative models published recently.

A Survey of Large Language Models

https://arxiv.org/abs/2303.18223
Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

On the Opportunities and Risks of Foundation Models

https://arxiv.org/abs/2108.07258
” a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations).”

Required Readings:

Emergent Abilities of Large Language Models

“an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”

Language Models are Few-Shot Learners

“GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”

Extra Readings:

A survey of Generative AI Applications

https://arxiv.org/abs/2306.02781
Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.

Generative AI: Perspectives from Stanford HAI

https://hai.stanford.edu/generative-ai-perspectives-stanford-hai

[29]: NLP Basics Introduction

read on: - 18 Jan 2024
FMBasic BasicLLM

Readings:

Basics of ML and DL:

Basics of NLP

URL
Typical NLP tasks / Challenges / Pipeline
f() on natural language
- Before Deep NLP (Pre 2012) • (BOW / LSI / Topic Modeling LDA )
- Word2Vec (2013-2016) • (GloVe/ FastText)
- Recurrent NN (2014-2016) • LSTM
- Seq2Seq
- Attention
- Self-Attention (2016 – now )
- Transformer (attention only Seq2Seq)
- BERT / RoBERTa/ XLNet/ GPT / …
A good code walk through on transformer at URL

</div>

2022-selfRead

---

No.	Date	Title and Information	PaperYear
1	2022, Dec, 3	RLHF + InstructGPT	2022-W6
2	2022, Dec, 1	Stable diffusion + DreamBooth + LoRA	2022-W5
3	2022, Oct, 1	Emergent Abilities of LLM	2022-W4
4	2022, Sep, 1	DiffDock + ESMfold	2022-W2
5	2022, Jun, 3	Decision Transformers	2022-W3
6	2022, May, 3	A Generalist Agent + offline RL + UniMask	2022-W1

[1]: RLHF + InstructGPT

read on: - 03 Dec 2022
6Reinforcement FMBasic RL AGI language model Human Alignment

Papers	Paper URL	Abstract
Training language models to follow instructions with human feedback	URL	“further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences	URL	“explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”

[2]: Stable diffusion + DreamBooth + LoRA

read on: - 01 Dec 2022
FMBasic FMMulti Diffusion Image synthesis Efficiency

Stable diffusion

URL
“High-Resolution Image Synthesis with Latent Diffusion Models”

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

URL
“personalization” of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. .”

LoRA: Low-Rank Adaptation of Large Language Models

“propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.”

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

https://arxiv.org/abs/2208.01618
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks.

[3]: Emergent Abilities of LLM

read on: - 01 Oct 2022
FMBasic language model

Emergent Abilities of Large Language Models

URL
“an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”

Language Models are Few-Shot Learners

URL
“GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”

On the Opportunities and Risks of Foundation Models

” a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations).”

The Power of Scale for Parameter-Efficient Prompt Tuning

https://arxiv.org/abs/2104.08691
Brian Lester, Rami Al-Rfou, Noah Constant
In this work, we explore “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s “few-shot” learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method “closes the gap” and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed “prefix tuning” of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

[4]: DiffDock + ESMfold

read on: - 01 Sep 2022
9DiscreteApp FMMulti Protein language model

Papers	Paper URL	Abstract
Evolutionary-scale prediction of atomic level protein structure with a language model	URL	“show that direct inference of structure from primary sequence using a large language model enables an order of magnitude speed-up in high resolution structure prediction. Leveraging the insight that language models learn evolutionary patterns across millions of sequences, we train models up to 15B parameters,…”
DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking	URL	“Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space.”

[5]: Decision Transformers

read on: - 03 Jun 2022
6Reinforcement RL AGI

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
https://arxiv.org/abs/2106.01345
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

Prompting Decision Transformer for Few-Shot Policy Generalization

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
https://arxiv.org/abs/2206.13499
Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

[6]: A Generalist Agent + offline RL + UniMask

read on: - 03 May 2022
6Reinforcement RL AGI

Papers

Paper URL

Abstract

A Generalist Agent

URL

Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

Why should we prefer offline reinforcement learning over behavioral cloning? ICLR 2022

URL

natural to ask: when can an offline RL method outperform BC with an equal amount of expert data, even when BC is a natural choice?

Uni[MASK]: Unified Inference in Sequential Decision Problems

URL

show how sequential decision making tasks can be thought of in terms of corresponding input maskings, enabling the training of a single model to perform all tasks at once. applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns.

2020team

---

No.	Date	Title and Information	PaperYear
1	2021, Jan, 3	Introductory reads on DeepLearning	2021-W0
2	2020, Aug, 5	Interpretable Deep Learning	2020-W8
3	2020, Jul, 5	Trustworthy Deep Learning	2020-W7
4	2020, Jun, 5	A few applications of Deep Learning	2020-W7
5	2020, May, 5	Optimization and New Loss in Deep Learning	2020-W7
6	2020, Apr, 5	Meta Deep Learning	2020-W4
7	2020, Mar, 5	Deep Reinforcement Learning	2020-W3
8	2020, Feb, 5	Latent and Generative Deep Learning	2020-W2
9	2020, Jan, 5	Learning Relation from Data with Deep Learning	2020-W0
10	2020, Jan, 5	GNN and Transformer	2020-W1

[1]: Introductory reads on DeepLearning

read on: - 03 Jan 2021
0Basics tutorial

Type	Papers	Paper URL	Our Slides
Dr Qi	Survey of 10 DeepLearning (DL) trends different from classic machine learning		OurSlide
Youtube	Generative DL Basics	Youtube1 + Youtube2	NA
Youtube	Computation Graph for DL (pytorch vs. tensorflow	Youtube URL + Youtube2	NA
Youtube	Auto Differentiation for DL	Youtube1+ Youtube2	NA
Youtube	RL basics and DL-RL basics	Youtube1 + Youtube2	NA
Youtube	Probabilistic programming and in DL Pyro	Youtube1 + Youtube2	NA
Youtube	Basics of Software Testing for DL	Youtube URL	NA
Course	Bill_CNN_Ng_Lecture_Notes		Bill’s Notes
Course	Bill_caltechMLnotes_ALL		Bill’s Notes
classic Paper	The Lottery Ticket Hypothesis		Morris’ Notes
classic Paper	NLP From Scratch		Morris’ Notes
classic Paper	Statistical Modeling The Two Cultures		Morris’ Notes
classic Paper	Attention_is_All_You_Need		Eli’ Notes
classic Paper	YOLO		Eli’ Notes
classic Paper	Neural Turing Machine		Jake Survey
classic Paper	BERT (Bidirectional Encoder Representation for Transformers): Pretraining of Deep Bidirectional Transformers for Language Understanding		Rishab Survey

[2]: Interpretable Deep Learning

read on: - 05 Aug 2020
3Reliable Interpretable black-box casual attention shapley concept

Index	Papers	Our Slides
0	A survey on Interpreting Deep Learning Models	Eli Survey
	Interpretable Machine Learning: Definitions,Methods, Applications	Arsh Survey
1	Explaining Explanations: Axiomatic Feature Interactions for Deep Networks	Arsh Survey
2	Shapley Value review	Arsh Survey
	L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data	Bill Survey
	Consistent Individualized Feature Attribution for Tree Ensembles	bill Survey
	Summary for A value for n-person games	Pan Survey
	L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data	Rishab Survey
3	Hierarchical Interpretations of Neural Network Predictions	Arsh Survey
	Hierarchical Interpretations of Neural Network Predictions	Rishab Survey
4	Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs	Arsh Survey
	Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs	Rishab Survey
5	Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models	Rishab Survey
		Sanchit Survey
	Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection	Sanchit Survey
6	This Looks Like That: Deep Learning for Interpretable Image Recognition	Pan Survey
7	AllenNLP Interpret	Rishab Survey
8	DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs	Rishab Survey
9	How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations	Rishab Survey
10	Attention is not Explanation	Sanchit Survey
		Pan Survey
11	Axiomatic Attribution for Deep Networks	Sanchit Survey
12	Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization	Sanchit Survey
13	Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier	Sanchit Survey
14	“Why Should I Trust You?”Explaining the Predictions of Any Classifier	Yu Survey
15	INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE	Pan Survey

[3]: Trustworthy Deep Learning

read on: - 05 Jul 2020
3Reliable bias data valuation robustness adversarial-examples regularization

Index	Papers	Our Slides
1	BIAS ALSO MATTERS: BIAS ATTRIBUTION FOR DEEP NEURAL NETWORK EXPLANATION	Arsh Survey
2	Data Shapley: Equitable Valuation of Data for Machine Learning	Arsh Survey
	What is your data worth? Equitable Valuation of Data	Sanchit Survey
3	Neural Network Attributions: A Causal Perspective	Zhe Survey
4	Defending Against Neural Fake News	Eli Survey
5	Interpretation of Neural Networks is Fragile	Eli Survey
	Interpretation of Neural Networks is Fragile	Pan Survey
6	Parsimonious Black-Box Adversarial Attacks Via Efficient Combinatorial Optimization	Eli Survey
7	Retrofitting Word Vectors to Semantic Lexicons	Morris Survey
8	On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models	Morris Survey
9	Towards Deep Learning Models Resistant to Adversarial Attacks	Pan Survey
10	Robust Attribution Regularization	Pan Survey
11	Sanity Checks for Saliency Maps	Sanchit Survey
12	Survey of data generation and evaluation in Interpreting DNN pipelines	Sanchit Survey
13	Think Architecture First: Benchmarking Deep Learning Interpretability in Time Series Predictions	Sanchit Survey
14	Universal Adversarial Triggers for Attacking and Analyzing NLP	Sanchit Survey
15	Apricot: Submodular selection for data summarization in Python	Arsh Survey

[4]: A few applications of Deep Learning

read on: - 05 Jun 2020
9DiscreteApp Protein Gene-network Chromatin language processing

Index	Papers	Our Slides
1	Protein 3D Structure Computed from Evolutionary Sequence Variation	Arsh Survey
3	Regulatory network inference on developmental and evolutionary lineages	Arsh Survey
4	Deep learning in ultrasound image analysis	Zhe Survey
5	Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (DeepBind)	Jack Survey
6	Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors	Jack Survey
7	BindSpace decodes transcription factor binding signals by large-scale sequence embedding	Jack Survey
8	FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning	Jack Survey
9	Query-Reduction Networks for Question Answering	Bill Survey

[5]: Optimization and New Loss in Deep Learning

read on: - 05 May 2020
4Optimization optimization mutual-information

Index	Papers	Our Slides
1	Review on Semi-Supervised Learning	Zhe Survey
2	Review on Generative Adversarial Networks	Zhe Survey
3	Information theory in deep learning	Zhe Survey
4	Lagrange Optimization	Zhe Survey
5	Deep Learning and Information Theory, and Graph Neural Network	Derrick Survey
6	Loss Functions for Deep Structured Models	Jack Survey
7	Group Sparsity and Optimization	Zhe Survey

[6]: Meta Deep Learning

read on: - 05 Apr 2020
7MetaDomain Multi-Task transfer-learning Generalization

Index	Papers	Our Slides
1	Invariant Risk Minimization	Zhe Survey
2	Causal Machine Learning	Zhe Survey
3	A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms	Zhe Survey
3	Review on Optimization-Based Meta Learning	Zhe Survey
4	Domain adaptation and counterfactual prediction	Zhe Survey
5	Gaussian Processes	Zhe Survey
6	A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data	Zhe Survey
7	Few-shot domain adaptation by causal mechanism transfer	Zhe Survey

[7]: Deep Reinforcement Learning

read on: - 05 Mar 2020
6Reinforcement RL Generalization

Index	Papers	Our Slides
1	Actor-Critic Methods for Control	Jake Survey
2	Generalization in Deep Reinforcement Learning	Jake Survey
3	Sample Efficient RL (Part 1)	Jake Survey
4	Sample Efficient RL (Part 2)	Jake Survey
5	Model-Free Value Methods in Deep RL	Jake Survey
6	Investigating Human Priors for Playing Video Games	Arsh Survey

[8]: Latent and Generative Deep Learning

read on: - 05 Feb 2020
5Generative Generative VAE low-rank

Index	Papers	Our Slides
1	Beta VAE, Ladder VAE, Causal VAE	Arsh Survey
2	Learnt Prior VAE	Arsh Survey
3	Multitask Graph Autoencoder	Arsh Survey
4	Introduction to component analysi	Zhe Survey
5	Normalizing flow	Zhe Survey
6	Nonlinear ICA	Zhe Survey
7	Deep Convolutional Inverse Graphics Network	Zhe Survey

[9]: Learning Relation from Data with Deep Learning

read on: - 05 Jan 2020
2GraphsNN Graph Relational Casual Markov

Index	Papers	Our Slides
1	A Flexible Generative Framework for Graph-based Semi-supervised Learning	Arsh Survey
2	Learning Discrete Structures for Graph Neural Networks	Arsh Survey
4	Graph Markov Neural Nets	Arsh Survey
	Graph Markov Neural Networks	Jack Survey
5	GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations	Arsh Survey
6	Subgraph Neural Networks	Arsh Survey
7	Pointer Graph Networks	Arsh Survey
8	Modeling Relational Data with Graph Convolutional Networks	Arsh Survey
9	Graph Learning	Zhe Survey
8	Neural Relational Inference	Zhe Survey

[10]: GNN and Transformer

read on: - 05 Jan 2020
8Scalable 2GraphsNN GCN graph attention

Index	Papers	Our Slides
1	Graph Convolutions: More than You Wanted to Know	Derrick Survey
2	Spectral Graph Sparsification	Derrick Survey
3	Complexity Analysis of Graph Convolutional Networks and in Attention based GNN	Derrick Survey
4	PyTorch-BigGraph: A Large-Scale Graph Embedding System	Derrick Survey
5	Scalable GNN Updates: More About PyTorch Geometric (PyG)	Derrick Survey
6	Time and Space Complexity of Graph Convolutional Networks	Derrick Survey
7	Large Scale GNN and Transformer Models and for Genomics	Jack Survey
8	Long Range Attention and Visualizing BERT	Jak Survey
9	Benchmarking Graph Neural Networks	Sanchit Survey

2019sCourse

---

No.	Date	Title and Information	PaperYear
1	2019, Apr, 5	GNN to Understand	2019-W12
2	2019, Mar, 29	GNN for NLP QA	2019-W11
3	2019, Mar, 25	Edge and Dynamic computing	2019-W10
4	2019, Mar, 22	GNN and scalable	2019-W9
5	2019, Mar, 15	GNN for Graph Generation	2019-W8
6	2019, Mar, 6	GNN Robustness	2019-W7
7	2019, Feb, 22	Geometric Deep Learning	2019-W5
8	2019, Feb, 17	GNN for Program Analysis	2019-W4
9	2019, Feb, 15	GNN for BioMed Applications	2019-W3
10	2019, Feb, 1	GNN Basics II - Deep Learning Advances on Graphs	2019-W2
11	2019, Jan, 25	GNN Basics I - Deep Learning Advances on Graphs	2019-W1

[1]: GNN to Understand

read on: - 05 Apr 2019
2GraphsNN 3Reliable Interpretable black-box casual seq2seq noise knowledge-graph attention

Presenter	Papers	Paper URL	Our Slides
Understand	Faithful and Customizable Explanations of Black Box Models	Pdf	Derrick PDF
Understand	A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17	Pdf	GaoJi PDF + Bill Pdf
Understand	How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning	Pdf + Pdf	GaoJi PDF
Understand	Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks	Pdf + PDF	GaoJi PDF
Understand	Attention is not Explanation, 2019	PDF
Understand	Understanding attention in graph neural networks, 2019	PDF

[2]: GNN for NLP QA

read on: - 29 Mar 2019
2GraphsNN 9DiscreteApp 5Generative generative QA NLP knowledge-graph GAN graph stylometric

Presenter	Papers	Paper URL	Our Slides
QA	A Comparison of Current Graph Database Models	Pdf + PDF2	Bill PDF
QA	Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text	Pdf	Bill [PDF + GaoJi Pdf
QA	Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan	Pdf	Bill PDF + GaoJi Pdf
QA	Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix	PDF + Pdf	Bill PDF + GaoJi Pdf
Text	Adversarial Text Generation via Feature-Mover’s Distance	URL	Faizan PDF
Text	Content preserving text generation with attribute controls	URL	Faizan PDF
Text	Multiple-Attribute Text Rewriting, ICLR, 2019,	URL	Faizan PDF
Text	Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace	URL	Faizan PDF

[3]: Edge and Dynamic computing

read on: - 25 Mar 2019
2GraphsNN 8Scalable mobile binary dynamic

Presenter	Papers	Paper URL	Our Slides
Edge	MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications	PDF
Edge	XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks	URL	Ryan PDF
Edge	DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices	Pdf	Eamon PDF
Edge	Loss-aware Binarization of Deep Networks, ICLR17	PDF	Ryan PDF
Edge	Espresso: Efficient Forward Propagation for Binary Deep Neural Networks	Pdf	Eamon PDF
Dynamic	Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution	PDF	Weilin PDF
Dynamic	Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems	PDF
Dynamic	Cavs: An Efficient Runtime System for Dynamic Neural Networks	Pdf

[4]: GNN and scalable

read on: - 22 Mar 2019
2GraphsNN 8Scalable graph discrete NLP embedding Hierarchical Parallel Distributed dynamic

Presenter	Papers	Paper URL	Our Slides
Scalable	FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling	Pdf	Ryan PDF + Arshdeep Pdf
Scalable	MILE: A Multi-Level Framework for Scalable Graph Embedding	Pdf	Ryan PDF
Scalable	LanczosNet: Multi-Scale Deep Graph Convolutional Networks	Pdf	Ryan PDF
Scalable	Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis	Pdf	Derrick PDF
Scalable	Towards Federated learning at Scale: System Design	URL	Derrick PDF
Scalable	DNN Dataflow Choice Is Overrated	PDF	Derrick PDF
Scalable	Towards Efficient Large-Scale Graph Neural Network Computing	Pdf	Derrick PDF
Scalable	PyTorch Geometric	URL
Scalable	PyTorch BigGraph	URL
Scalable	Simplifying Graph Convolutional Networks	Pdf
Scalable	Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks	Pdf

[5]: GNN for Graph Generation

read on: - 15 Mar 2019
2GraphsNN 5Generative generative GAN graph NLP graphical-model discrete RNN robustness molecule Variational Autoencoder RL Beam imputation Matrix-Completion random

Presenter	Papers	Paper URL	Our Slides
Generate	Maximum-Likelihood Augmented Discrete Generative Adversarial Networks	PDF	Tkach PDF + GaoJi Pdf
Generate	Graphical Generative Adversarial Networks	PDF	Arshdeep PDF
Generate	GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018	PDF	Arshdeep PDF
Generate	Inference in probabilistic graphical models by Graph Neural Networks	PDF	Arshdeep PDF
Generate	Encoding robust representation for graph generation	Pdf	Arshdeep PDF
Generate	Junction Tree Variational Autoencoder for Molecular Graph Generation	Pdf	Tkach PDF + Arshdeep Pdf
Generate	Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18		Tkach PDF
Generate	Towards Variational Generation of Small Graphs	Pdf	Tkach PDF + Arshdeep Pdf
Generate	Convolutional Imputation of Matrix Networks	Pdf	Tkach PDF
Generate	Graph Convolutional Matrix Completion	Pdf	Tkach PDF
Generate	NetGAN: Generating Graphs via Random Walks ICML18	[ULR]	Tkach PDF
Beam	Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement	URL	Tkach PDF

[6]: GNN Robustness

read on: - 06 Mar 2019
2GraphsNN 3Reliable graph structured Adversarial-Examples binary

Presenter	Papers	Paper URL	Our Slides
Robust	Adversarial Attacks on Graph Structured Data	Pdf	Faizan [PDF + GaoJi Pdf
Robust	KDD’18 Adversarial Attacks on Neural Networks for Graph Data	Pdf	Faizan PDF + GaoJi Pdf
Robust	Attacking Binarized Neural Networks	Pdf	Faizan PDF

[7]: Geometric Deep Learning

read on: - 22 Feb 2019
2GraphsNN 2Architecture geometric graph matching dynamic manifold invariant

Presenter	Papers	Paper URL	Our Slides
spherical	Spherical CNNs	Pdf	Fuwen PDF + Arshdeep Pdf
dynamic	Dynamic graph cnn for learning on point clouds, 2018	Pdf	Fuwen PDF
basics	Geometric Deep Learning (simple introduction video)	URL
matching	All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks	Pdf	Fuwen PDF
completion	Geometric matrix completion with recurrent multi-graph neural networks	Pdf	Fuwen PDF
Tutorial	Geometric Deep Learning on Graphs and Manifolds	URL	Arsh PDF
matching	Similarity Learning with Higher-Order Proximity for Brain Network Analysis		Arsh PDF
pairwise	Pixel to Graph with Associative Embedding	PDF	Fuwen PDF
3D	3D steerable cnns: Learning rotationally equivariant features in volumetric data	URL	Fuwen PDF

[8]: GNN for Program Analysis

read on: - 17 Feb 2019
2GraphsNN 9DiscreteApp embedding program heterogeneous

Presenter	Papers	Paper URL	Our Slides
Program	Neural network-based graph embedding for cross-platform binary code similarity detection	Pdf + Pdf	Faizan PDF + GaoJi Pdf
Program	Deep Program Reidentification: A Graph Neural Network Solution	Pdf	Weilin PDF
Program	Heterogeneous Graph Neural Networks for Malicious Account Detection	Pdf	Weilin Pdf
Program	Learning to represent programs with graphs	Pdf ¹

[9]: GNN for BioMed Applications

read on: - 15 Feb 2019
2GraphsNN 9DiscreteApp attention relational visualizing geometric DNA protein molecule

Presenter	Papers	Paper URL	Our Slides
Bio	KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 ¹	Pdf	Eli Pdf
Bio	Molecular geometry prediction using a deep generative graph neural network	Pdf	Eli Pdf
Bio	Visualizing convolutional neural network protein-ligand scoring	PDF()	Eli PDF
Bio	Deep generative models of genetic variation capture mutation effects	PDF()	Eli PDF
Bio	Attentive cross-modal paratope prediction	Pdf	Eli PDF

[10]: GNN Basics II - Deep Learning Advances on Graphs

read on: - 01 Feb 2019
2GraphsNN Semi-supervised relational matching graph

Presenter	Papers	Paper URL	Our Slides
Matching	Deep Learning of Graph Matching,	PDF+ PDF	Jack Pdf
Matching	Graph Edit Distance Computation via Graph Neural Networks	PDF	Jack Pdf
Basics	Link Prediction Based on Graph Neural Networks	Pdf	Jack Pdf
Basics	Supervised Community Detection with Line Graph Neural Networks	Pdf	Jack Pdf
Basics	Graph mining: Laws, generators, and algorithms	Pdf	Arshdeep PDF
pooling	Hierarchical graph representation learning with differentiable pooling	PDF	Eamon PDF

[11]: GNN Basics I - Deep Learning Advances on Graphs

read on: - 25 Jan 2019
2GraphsNN 0Basics 8Scalable invariant scalable embedding

Presenter	Papers	Paper URL	Our Notes
Basics	GraphSAGE: Large-scale Graph Representation Learning by Jure Leskovec Stanford University	URL + PDF
Basics	Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering by Xavier Bresson	URL + PDF	Ryan Pdf
Basics	Gated Graph Sequence Neural Networks by Microsoft Research	URL + PDF	Faizan Pdf
Basics	DeepWalk - Turning Graphs into Features via Network Embeddings	URL + PDF
Basics	Spectral Networks and Locally Connected Networks on Graphs ¹	Pdf	GaoJi slides + Bill Pdf
Basics	A Comprehensive Survey on Graph Neural Networks/ Graph Neural Networks: A Review of Methods and Applications	Pdf	Jack Pdf
GCN	Semi-Supervised Classification with Graph Convolutional Networks	Pdf	Jack Pdf

2019fCourse

---

No.	Date	Title and Information	PaperYear
1	2019, Dec, 12	deep2reproduce 2019 Fall - 1Analysis papers	2019-fall Students deep2reproduce
2	2019, Dec, 11	deep2reproduce 2019 Fall - 2Architecture papers	2019-fall Students deep2reproduce
3	2019, Dec, 10	deep2reproduce 2019 Fall - 3Reliable papers	2019-fall Students deep2reproduce
4	2019, Dec, 9	deep2reproduce 2019 Fall - 5Generative papers	2019-fall Students deep2reproduce
5	2019, Dec, 8	deep2reproduce 2019 Fall - 6Reinforcement papers	2019-fall Students deep2reproduce
6	2019, Dec, 7	deep2reproduce 2019 Fall - 7MetaDomain papers	2019-fall Students deep2reproduce
7	2019, Dec, 6	deep2reproduce 2019 Fall - 8Scalable papers	2019-fall Students deep2reproduce
8	2019, Nov, 3	A general survey	2019-fall Course

[1]: deep2reproduce 2019 Fall - 1Analysis papers

read on: - 12 Dec 2019
1Theoretical analysis generalization forgetting training optimization subspace informax normalization Sample-selection

Team INDEX	Title & Link	Tags	Our Slide
T2	Empirical Study of Example Forgetting During Deep Neural Network Learning	Sample Selection, forgetting	OurSlide
T29	Select Via Proxy: Efficient Data Selection For Training Deep Networks	Sample Selection	OurSlide
T9	How SGD Selects the Global Minima in over-parameterized Learning	optimization	OurSlide
T10	Escaping Saddles with Stochastic Gradients	optimization	OurSlide
T13	To What Extent Do Different Neural Networks Learn the Same Representation	subspace	OurSlide
T19	On the Information Bottleneck Theory of Deep Learning	informax	OurSlide
T20	Visualizing the Loss Landscape of Neural Nets	normalization	OurSlide
T21	Using Pre-Training Can Improve Model Robustness and Uncertainty	training, analysis	OurSlide
T24	Norm matters: efficient and accurate normalization schemes in deep networks	normalization	OurSlide

[2]: deep2reproduce 2019 Fall - 2Architecture papers

read on: - 11 Dec 2019
2Architecture structured CNN RNN loss

Team INDEX	Title & Link	Tags	Our Slide
T5	Deep Structured Prediction with Nonlinear Output Transformations		structured	OurSlide
T12	Large Margin Deep Networks for Classification	OurSlide	large-margin
T15	Wide Activation for Efficient and Accurate Image Super-Resolution	CNN	OurSlide
T17	Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks	RNN	OurSlide
T28	Processing of missing data by neural networks	imputation	OurSlide
T27	Implicit Acceleration by Overparameterization	analysis	OurSlide

[3]: deep2reproduce 2019 Fall - 3Reliable papers

read on: - 10 Dec 2019
3Reliable submodular safety adversarial-examples robustness model-as-sample privacy Attribution Relational

Team INDEX	Title & Link	Tags	Our Slide
T3	Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints	submodular, coreset, safety	OurSlide
T6	Decision Boundary Analysis of Adversarial Examples	adversarial-examples	OurSlide
T8	Robustness may be at odds with accuracy	robustness	OurSlide
T18	Towards Reverse-Engineering Black-Box Neural Networks	meta, model-as-sample, safety, privacy	OurSlide
T23	The Odds are Odd: A Statistical Test for Detecting Adversarial Examples	adversarial-examples	OurSlide
T25	Learning how to explain neural networks: PatternNet and PatternAttribution	Attribution, Interpretable	OurSlide
T31	Detecting Statistical Interactions from Neural Network Weights	Interpretable, Relational	OurSlide

[4]: deep2reproduce 2019 Fall - 5Generative papers

read on: - 09 Dec 2019
5Generative GAN VAE encoder-decoder

Team INDEX	Title & Link	Tags	Our Slide
T14	CAN: Creative Adversarial Networks Generating “Art”	GAN	OurSlide
T26	Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation	encoder-decoder, dialog, VAE, Interpretable	OurSlide
T32	Which Training Methods for GANs do actually Converge	convergence, optimization, GAN	OurSlide

[5]: deep2reproduce 2019 Fall - 6Reinforcement papers

read on: - 08 Dec 2019
6Reinforcement verification RL

Team INDEX	Title & Link	Tags	Our Slide
T1	Safe Reinforcement Learning via Shielding	RL, safety, verification	OurSlide

[6]: deep2reproduce 2019 Fall - 7MetaDomain papers

read on: - 07 Dec 2019
7MetaDomain BERT Transfer Multi-task regularization

Team INDEX	Title & Link	Tags	Our Slide
T11	Parameter-Efficient Transfer Learning for NLP	meta, BERT, text, Transfer	OurSlide
T22	Deep Asymmetric Multi-task Feature Learning	meta, regularization, Multi-task	OurSlide

[7]: deep2reproduce 2019 Fall - 8Scalable papers

read on: - 06 Dec 2019
8Scalable binarization small-data Quantization

Team INDEX	Title & Link	Tags	Our Slide
T33	The High-Dimensional Geometry of Binary Neural Networks	Quantization, binarization, scalable	OurSlide
T34	Modern Neural Networks Generalize on Small Data Sets	small-data, analysis, ensemble	OurSlide
T4	Cognitive Scheduler for Heterogeneous High Performance Computing System	system-application	OurSlide

[8]: A general survey

read on: - 03 Nov 2019
0Basics tutorial

Presenter	Papers	Paper URL	Our Slides
Dr Qi	Survey of Recent DeepLearning to 12 Groups / PDF

2018Reads

---

No.	Date	Title and Information	PaperYear
1	2018, Dec, 29	Generate18- Deep Generative Models for discrete	2018-team
2	2018, Dec, 21	Generate18- Deep Generative Models for Graphs	2018-team
3	2018, Dec, 20	Application18- DNN for QA and MedQA	2018-team
4	2018, Dec, 2	Reliable18- Adversarial Attacks and DNN	2018-team
5	2018, Nov, 20	Reliable18- Adversarial Attacks and DNN	2018-team
6	2018, Oct, 25	Structure18- DNNs Varying Structures	2018-team
7	2018, Oct, 16	Application18- Graph DNN in a Few Bio Tasks	2018-team
8	2018, Oct, 13	Application18- DNNs in a Few Bio CRISPR Tasks	2018-team
9	2018, Oct, 12	Reliable18- Understand DNNs	2018-team
10	2018, Oct, 11	Structures18- DNN for Relations	2018-team
11	2018, Aug, 29	Survey18- My Tutorial Talk at ACM BCB18 - Interpretable Deep Learning for Genomics	2018-me
12	2018, Aug, 27	Application18- A few DNN for Question Answering	2018-team
13	2018, Aug, 23	Generative18 -A few more deep discrete Generative Models	2018-team
14	2018, Aug, 13	Application18- DNNs in a Few BioMedical Tasks	2018-team
15	2018, Aug, 3	Reliable18- Testing and Verifying DNNs	2018-team
16	2018, May, 20	Reliable18- Adversarial Attacks and DNN and More	2018-team
17	2018, May, 12	Reliable18- Adversarial Attacks and DNN	2018-team
18	2018, May, 11	Structures18- DNN for Multiple Label Classification	2018-team
19	2018, May, 3	Structures18- More Attentions	2018-team
20	2018, Apr, 20	Generative18 -Generative Adversarial Network (classified)	2018-team
21	2018, Feb, 20	Survey18- My Survey Talk at UVA HMI Seminar - A quick and rough overview of DNN	2018-me
22	2018, Jan, 10	Application18- Property of DeepNN Models and Discrete tasks	2018-team

[1]: Generate18- Deep Generative Models for discrete

read on: - 29 Dec 2018
5Generative 2GraphsNN generative GAN discrete Autoencoder Variational

Presenter	Papers	Paper URL	Our Slides
Tkach	Boundary-Seeking Generative Adversarial Networks	PDF	PDF
Tkach	Maximum-Likelihood Augmented Discrete Generative Adversarial Networks	PDF	PDF
Tkach	Generating Sentences from a Continuous Space	PDF	PDF

[2]: Generate18- Deep Generative Models for Graphs

read on: - 21 Dec 2018
5Generative 2GraphsNN generative GAN discrete Autoencoder Variational molecule graph DNA

Presenter	Papers	Paper URL	Our Slides
Arshdeep	Constrained Graph Variational Autoencoders for Molecule Design	PDF	PDF
Arshdeep	Learning Deep Generative Models of Graphs	PDF	PDF
Arshdeep	Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation	PDF	PDF
Jack	Generating and designing DNA with deep generative models	PDF	PDF

[3]: Application18- DNN for QA and MedQA

read on: - 20 Dec 2018
2Architecture 9DiscreteApp seq2seq recommendation QA graph relational EHR

Presenter	Papers	Paper URL	Our Slides
Bill	Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning	PDF	PDF
Chao	Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I)	PDF	PDF
Chao	Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II)	PDF	PDF
Derrick	Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III)	PDF	PDF
Chao	Reading Wikipedia to Answer Open-Domain Questions	PDF	PDF
Jennifer	Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text	PDF	PDF

[4]: Reliable18- Adversarial Attacks and DNN

read on: - 02 Dec 2018
3Reliable Adversarial-Examples visualizing Interpretable EHR NLP

Presenter	Papers	Paper URL	Our Slides
Jennifer	Adversarial Attacks Against Medical Deep Learning Systems	PDF	PDF
Jennifer	Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning	PDF	PDF
Jennifer	Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers	PDF	PDF
Jennifer	CleverHans	PDF	PDF
Ji	Ji-f18-New papers about adversarial attack		PDF

[5]: Reliable18- Adversarial Attacks and DNN

read on: - 20 Nov 2018
3Reliable Adversarial-Examples software-testing Interpretable distillation

Presenter	Papers	Paper URL	Our Slides
Bill	Adversarial Examples that Fool both Computer Vision and Time-Limited Humans	PDF	PDF
Bill	Adversarial Attacks Against Medical Deep Learning Systems	PDF	PDF
Bill	TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing	PDF	PDF
Bill	Distilling the Knowledge in a Neural Network	PDF	PDF
Bill	Defensive Distillation is Not Robust to Adversarial Examples	PDF	PDF
Bill	Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow	PDF	PDF

[6]: Structure18- DNNs Varying Structures

read on: - 25 Oct 2018
2Architecture 8Scalable 7MetaDomain Architecture-Search Hyperparameter dynamic

Presenter	Papers	Paper URL	Our Slides
Arshdeep	Learning Transferable Architectures for Scalable Image Recognition	PDF	PDF
Arshdeep	FractalNet: Ultra-Deep Neural Networks without Residuals	PDF	PDF

[7]: Application18- Graph DNN in a Few Bio Tasks

read on: - 16 Oct 2018
2GraphsNN 9DiscreteApp graph protein molecule

Presenter	Papers	Paper URL	Our Slides
Eric	Modeling polypharmacy side effects with graph convolutional networks	PDF	PDF
Eric	Protein Interface Prediction using Graph Convolutional Networks	PDF	PDF
Eric	Structure biology meets data science: does anything change	URL	PDF
Eric	DeepSite: protein-binding site predictor using 3D-convolutional neural networks	URL	PDF

[8]: Application18- DNNs in a Few Bio CRISPR Tasks

read on: - 13 Oct 2018
9DiscreteApp brain CRISPR DNA Genomics generative protein

Presenter	Papers	Paper URL	Our Slides
Arshdeep	deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018	PDF	PDF
Arshdeep	The CRISPR tool kit for genome editing and beyond, Mazhar Adli	PDF	PDF
Eric	Intro of Genetic Engineering	PDF	PDF
Eric	Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs	PDF	PDF
Brandon	Generative Modeling for Protein Structure	URL	PDF

[9]: Reliable18- Understand DNNs

read on: - 12 Oct 2018
3Reliable visualizing interpretable Attribution GAN understanding

Presenter	Papers	Paper URL	Our Slides
Jack	A Unified Approach to Interpreting Model Predictions	PDF	PDF
Jack	“Why Should I Trust You?”: Explaining the Predictions of Any Classifier	PDF	PDF
Jack	Visual Feature Attribution using Wasserstein GANs	PDF	PDF
Jack	GAN Dissection: Visualizing and Understanding Generative Adversarial Networks	PDF	PDF
GaoJi	Recent Interpretable machine learning papers	PDF	PDF
Jennifer	The Building Blocks of Interpretability	PDF	PDF

[10]: Structures18- DNN for Relations

read on: - 11 Oct 2018
2Architecture 2GraphsNN relational

Presenter	Papers	Paper URL	Our Slides
Arshdeep	Relational inductive biases, deep learning, and graph networks	PDF	PDF
Arshdeep	Discriminative Embeddings of Latent Variable Models for Structured Data	PDF	PDF
Jack	Deep Graph Infomax	PDF	PDF

[11]: Survey18- My Tutorial Talk at ACM BCB18 - Interpretable Deep Learning for Genomics

read on: - 29 Aug 2018
9DiscreteApp tutorial Interpretable

Presenter	Papers	Paper URL	Our Slides
Dr. Qi	Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation		PDF

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin, NIPS2017 / Ritambhara Singh, Jack Lanchantin, Arshdeep Sekhon, Yanjun Qi

The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together? Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; we call AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map. Code and data are shared atwww.deepchrome.org

[12]: Application18- A few DNN for Question Answering

read on: - 27 Aug 2018
2Architecture 8Scalable 9DiscreteApp trees metric-learning embedding QA

Presenter	Papers	Paper URL	Our Slides
Derrick	GloVe: Global Vectors for Word Representation	PDF	PDF
Derrick	PARL.AI: A unified platform for sharing, training and evaluating dialog models across many tasks.	URL	PDF
Derrick	scalable nearest neighbor algorithms for high dimensional data (PAMI14) ¹	PDF	PDF
Derrick	StarSpace: Embed All The Things!	PDF	PDF
Derrick	Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading, Martin Raison, Pierre-Emmanuel Mazaré, Rajarshi Das, Antoine Bordes	PDF	PDF

[13]: Generative18 -A few more deep discrete Generative Models

read on: - 23 Aug 2018
5Generative 9DiscreteApp generative generalization GAN discrete Amortized Autoencoder Variational program

Presenter	Papers	Paper URL	Our Slides
Arshdeep	The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh ¹	PDF	PDF
GaoJi	Summary Of Several Autoencoder models	PDF	PDF
GaoJi	Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts ²	PDF	PDF
GaoJi	Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN	PDF	PDF
Arshdeep	Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush ³	PDF	PDF
Arshdeep	Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals ⁴	PDF	PDF

[14]: Application18- DNNs in a Few BioMedical Tasks

read on: - 13 Aug 2018
3Reliable 6Reinforcement 9DiscreteApp brain RNA DNA Genomics generative

Presenter	Papers	Paper URL	Our Slides
Arshdeep	DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning.	PDF	PDF
Arshdeep	Solving the RNA design problem with reinforcement learning, PLOSCB ¹	PDF	PDF
Arshdeep	Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk ²	PDF	PDF
Arshdeep	Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio ³	PDF	PDF
Brandon	Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics	PDF	PDF
Arshdeep	Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions ²	PDF	PDF

[15]: Reliable18- Testing and Verifying DNNs

read on: - 03 Aug 2018
3Reliable 6Reinforcement RL Fuzzing Adversarial-Examples verification software-testing black-box white-box

Presenter	Papers	Paper URL	Our Slides
GaoJi	Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh	PDF	PDF
GaoJi	Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer	PDF	PDF
GaoJi	DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray	PDF	PDF
GaoJi	A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors ¹	PDF	PDF
GaoJi	A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel)	PDF	PDF
Testing	DeepXplore: Automated Whitebox Testing of Deep Learning Systems	PDF

[16]: Reliable18- Adversarial Attacks and DNN and More

read on: - 20 May 2018
3Reliable 9DiscreteApp seq2seq Adversarial-Examples Certified-Defense

Presenter	Papers	Paper URL	Our Slides
Bill	Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples	PDF	PDF
Bill	Adversarial Examples for Evaluating Reading Comprehension Systems, Robin Jia, Percy Liang	PDF	PDF
Bill	Certified Defenses against Adversarial Examples, Aditi Raghunathan, Jacob Steinhardt, Percy Liang	PDF	PDF
Bill	Provably Minimally-Distorted Adversarial Examples, Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill	PDF	PDF

[17]: Reliable18- Adversarial Attacks and DNN

read on: - 12 May 2018
3Reliable 9DiscreteApp Adversarial-Examples generative Interpretable

Presenter	Papers	Paper URL	Our Slides
Bill	Intriguing Properties of Adversarial Examples, Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le ¹	PDF	PDF
Bill	Adversarial Spheres ²	PDF	PDF
Bill	Adversarial Transformation Networks: Learning to Generate Adversarial Examples, Shumeet Baluja, Ian Fischer ³	PDF	PDF
Bill	Thermometer encoding: one hot way to resist adversarial examples ⁴	PDF	PDF
	Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow ⁵	PDF

[18]: Structures18- DNN for Multiple Label Classification

read on: - 11 May 2018
2Architecture 2GraphsNN multi-label structured Adversarial-loss attention RNN

Presenter	Papers	Paper URL	Our Slides
Chao	Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification	PDF	PDF
Jack	FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning	PDF	PDF
BasicMLC	Multi-Label Classification: An Overview	PDF
SPEN	Structured Prediction Energy Networks	PDF
InfNet	Learning Approximate Inference Networks for Structured Prediction	PDF
SPENMLC	Deep Value Networks	PDF
Adversarial	Semantic Segmentation using Adversarial Networks	PDF
EmbedMLC	StarSpace: Embed All The Things!	PDF
deepMLC	CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016	PDF
deepMLC	Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018	PDF

[19]: Structures18- More Attentions

read on: - 03 May 2018
2Architecture attention relational Variational

Presenter	Papers	Paper URL	Our Slides
Arshdeep	Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ¹	PDF	PDF
Arshdeep	Latent Alignment and Variational Attention ²	PDF	PDF
Arshdeep	Modularity Matters: Learning Invariant Relational Reasoning Tasks, Jason Jo, Vikas Verma, Yoshua Bengio ³	PDF	PDF

[20]: Generative18 -Generative Adversarial Network (classified)

read on: - 20 Apr 2018
5Generative DNA generative GAN generalization

Presenter	Papers	Paper URL	Our Slides
BrandonLiu	Summary of Recent Generative Adversarial Networks (Classified)		PDF
Jack	Generating and designing DNA with deep generative models, Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey	PDF	PDF
GaoJi	More about basics of GAN		PDF
	McGan: Mean and Covariance Feature Matching GAN, PMLR 70:2527-2535	PDF
	Wasserstein GAN, ICML17	PDF
	Geometrical Insights for Implicit Generative Modeling, L Bottou, M Arjovsky, D Lopez-Paz, M Oquab	PDF

[21]: Survey18- My Survey Talk at UVA HMI Seminar - A quick and rough overview of DNN

read on: - 20 Feb 2018
0Basics

Presenter	Papers	Paper URL	Our Slides
Dr. Qi	A quick and rough survey of Deep-Neural-Networks		PDF

[22]: Application18- Property of DeepNN Models and Discrete tasks

read on: - 10 Jan 2018
3Reliable embedding generative NLP generalization NLP

Presenter	Papers	Paper URL	Our Slides
Bill	Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation ¹	PDF	PDF
Bill	Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio	PDF	PDF
Bill	Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang ²	PDF	PDF
Bill	On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick	PDF	PDF

2017Reads

---

No.	Date	Title and Information	PaperYear
1	2017, Jul, 22	Reliable17-Testing and Machine Learning Basics	2017-team
2	2017, Jun, 22	Structures17 - Adaptive Deep Networks II	2017-team
3	2017, Jun, 2	Structures17 -Adaptive Deep Networks I	2017-team
4	2017, May, 22	Generative17- Generative Deep Networks	2017-team
5	2017, Apr, 22	Optimization17- Optimization in DNN	2017-team
6	2017, Feb, 22	Reliable17-Secure Machine Learning	2017-team
7	2017, Jan, 21	Basic BioMed Application Reads we finished before 2017	2017-team
8	2017, Jan, 20	Basic16- DNN to be Scalable	2017-team
9	2017, Jan, 19	Basic16- Basic Deep NN and Robustness	2017-team
10	2017, Jan, 18	Basic16- Basic Deep NN with Memory	2017-team
11	2017, Jan, 12	Basic16- Basic DNN Embedding we read for Ranking/QA	2017-team
12	2017, Jan, 12	Basic16- Basic DNN Reads we finished for NLP/Text	2017-team

[1]: Reliable17-Testing and Machine Learning Basics

read on: - 22 Jul 2017
3Reliable software-testing white-box black-box robustness Metamorphic Influence Functions

Presenter	Papers	Paper URL	Our Slides
GaoJi	A few useful things to know about machine learning	PDF	PDF
GaoJi	A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions	PDF	PDF
GaoJi	Automated White-box Testing of Deep Learning Systems ¹	PDF	PDF
GaoJi	Testing and Validating Machine Learning Classifiers by Metamorphic Testing ²	PDF	PDF
GaoJi	Software testing: a research travelogue (2000–2014)	PDF	PDF

[2]: Structures17 - Adaptive Deep Networks II

read on: - 22 Jun 2017
2Architecture 8Scalable low-rank binary dynamic learn2learn optimization

Presenter	Papers	Paper URL	Our Slides
Arshdeep	Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction ¹	PDF	PDF
Arshdeep	Decoupled Neural Interfaces Using Synthetic Gradients ²	PDF	PDF
Arshdeep	Diet Networks: Thin Parameters for Fat Genomics ³	PDF	PDF
Arshdeep	Metric Learning with Adaptive Density Discrimination ⁴	PDF	PDF

[3]: Structures17 -Adaptive Deep Networks I

read on: - 02 Jun 2017
2Architecture 8Scalable low-rank binary dynamic learn2learn optimization

Presenter	Papers	Paper URL	Our Slides
Arshdeep	HyperNetworks, David Ha, Andrew Dai, Quoc V. Le ICLR 2017 ¹	PDF	PDF
Arshdeep	Learning feed-forward one-shot learners ²	PDF	PDF
Arshdeep	Learning to Learn by gradient descent by gradient descent ³	PDF	PDF
Arshdeep	Dynamic Filter Networks ⁴ https://arxiv.org/abs/1605.09673	PDF	PDF

[4]: Generative17- Generative Deep Networks

read on: - 22 May 2017
5Generative generative GAN

Presenter	Papers	Paper URL	Our Slides
Tobin	Energy-Based Generative Adversarial Network ¹	PDF	PDF
Jack	Three Deep Generative Models	PDF	PDF

[5]: Optimization17- Optimization in DNN

read on: - 22 Apr 2017
4Optimization optimization scalable EM propagation mimic

Presenter	Papers	Paper URL	Our Slides
Muthu	Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal ¹	PDF	PDF
Muthu	Fast Training of Recurrent Networks Based on EM Algorithm (1998) ²	PDF	PDF
Muthu	FitNets: Hints for Thin Deep Nets, ICLR15 ³	PDF	PDF
Muthu	Two NIPS 2015 Deep Learning Optimization Papers	PDF	PDF
Muthu	Difference Target Propagation (2015) ⁴	PDF	PDF

[6]: Reliable17-Secure Machine Learning

read on: - 22 Feb 2017
3Reliable secure Privacy Cryptography

Presenter	Papers	Paper URL	Our Slides
Tobin	Summary of A few Papers on: Machine Learning and Cryptography, (e.g., learning to Protect Communications with Adversarial Neural Cryptography) ¹	PDF	PDF
Tobin	Privacy Aware Learning (NIPS12) ²	PDF	PDF
Tobin	Can Machine Learning be Secure?(2006)	PDF	PDF

[7]: Basic BioMed Application Reads we finished before 2017

read on: - 21 Jan 2017
9DiscreteApp DNA RNA protein invariant binary random relational

Presenter	Papers	Paper URL
DeepBind	Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning	PDF
DeepSEA	Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk	PDF
DeepSEA	Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014
BioBasics	A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13
BioBasics	Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011
BioBasics	Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004
BioBasics	NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels
MedSignal	Segmenting Time Series: A Survey and Novel Approach,	PDF

[8]: Basic16- DNN to be Scalable

read on: - 20 Jan 2017
0Basics 2Architecture 8Scalable scalable random sparsity binary hash compression low-rank distributed dimension reduction pruning sketch Parallel

Presenter	Papers	Paper URL	Our Slides
scalable	Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1]	PDF
data scalable	Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2]	PDF 2014 + PDF
Binary	Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
Model	Binary embeddings with structured hashed projections ¹	PDF	PDF
Model	Deep Compression: Compressing Deep Neural Networks (ICLR 2016) ²	PDF	PDF

[9]: Basic16- Basic Deep NN and Robustness

read on: - 19 Jan 2017
0Basics 3Reliable Adversarial-Examples robustness visualizing Interpretable Certified-Defense

Presenter	Papers	Paper URL	Our Slides
AE	Intriguing properties of neural networks /	PDF
AE	Explaining and Harnessing Adversarial Examples	PDF
AE	Towards Deep Learning Models Resistant to Adversarial Attacks	PDF
AE	DeepFool: a simple and accurate method to fool deep neural networks	PDF
AE	Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner	PDF	PDF
Data	Basic Survey of ImageNet - LSVRC competition	URL	PDF
Understand	Understanding Black-box Predictions via Influence Functions	PDF
Understand	Deep inside convolutional networks: Visualising image classification models and saliency maps	PDF
Understand	BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1]	PDF
provable	Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter,	URL

[10]: Basic16- Basic Deep NN with Memory

read on: - 18 Jan 2017
0Basics 2Architecture 7MetaDomain memory NTM seq2seq pointer set attention meta-learning Few-Shot matching net metric-learning

Presenter	Papers	Paper URL	Our Slides
seq2seq	Sequence to Sequence Learning with Neural Networks	PDF
Set	Pointer Networks	PDF
Set	Order Matters: Sequence to Sequence for Sets	PDF
Point Attention	Multiple Object Recognition with Visual Attention	PDF
Memory	End-To-End Memory Networks	PDF	Jack Survey
Memory	Neural Turing Machines	PDF
Memory	Hybrid computing using a neural network with dynamic external memory	PDF
Muthu	Matching Networks for One Shot Learning (NIPS16) ¹	PDF	PDF
Jack	Meta-Learning with Memory-Augmented Neural Networks (ICML16) ²	PDF	PDF
Metric	ICML07 Best Paper - Information-Theoretic Metric Learning	PDF

[11]: Basic16- Basic DNN Embedding we read for Ranking/QA

read on: - 12 Jan 2017
0Basics 2Architecture 9DiscreteApp embedding recommendation QA graph relational

Presenter	Papers	Paper URL
QA	Learning to rank with (a lot of) word features	PDF
Relation	A semantic matching energy function for learning with multi-relational data	PDF
Relation	Translating embeddings for modeling multi-relational data	PDF
QA	Reading wikipedia to answer open-domain questions	PDF
QA	Question answering with subgraph embeddings	PDF

[12]: Basic16- Basic DNN Reads we finished for NLP/Text

read on: - 12 Jan 2017
0Basics 2Architecture 9DiscreteApp embedding text BERT seq2seq attention NLP curriculum BackProp relational

Presenter	Papers	Paper URL	Our Slides
NLP	A Neural Probabilistic Language Model	PDF
Text	Bag of Tricks for Efficient Text Classification	PDF
Text	Character-level Convolutional Networks for Text Classification	PDF
NLP	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	PDF
seq2seq	Neural Machine Translation by Jointly Learning to Align and Translate	PDF
NLP	Natural Language Processing (almost) from Scratch	PDF
Train	Curriculum learning	PDF
Muthu	NeuroIPS Embedding Papers survey 2012 to 2015	NIPS	PDF
Basics	Efficient BackProp	PDF

2017Course

---

No.	Date	Title and Information	PaperYear
1	2017, Nov, 30	RL IV - RL with varying structures	2017-W15
2	2017, Nov, 28	RL III - Basic tutorial RLSS17 (2)	2017-W14
3	2017, Nov, 21	RL II - Basic tutorial RLSS17	2017-W14
4	2017, Nov, 16	Generative III - GAN training	2017-W13
5	2017, Nov, 14	Generative II - Deep Generative Models	2017-W13
6	2017, Nov, 9	Optimization IV - change DNN architecture for Optimization	2017-W12
7	2017, Nov, 7	Optimization III - Optimization for DNN	2017-W12
8	2017, Nov, 2	Optimization II - DNN for Optimization	2017-W11
9	2017, Oct, 31	Optimization I - Understanding DNN Optimization	2017-W11
10	2017, Oct, 26	Reliable Applications VI - Robustness2	2017-W10
11	2017, Oct, 23	Reliable Applications IV - Robustness	2017-W9
12	2017, Oct, 17	Reliable Applications III - Interesting Tasks	2017-W9
13	2017, Oct, 12	Reliable Applications II - Data privacy	2017-W8
14	2017, Oct, 11	Reliable Applications V - Understanding2	2017-W10
15	2017, Oct, 10	Reliable Applications I - Understanding	2017-W8
16	2017, Oct, 5	Structure VI - DNN with Adaptive Structures	2017-W7
17	2017, Oct, 3	Structure V - DNN with Attention 3	2017-W7
18	2017, Sep, 28	Structure IV - DNN with Attention 2	2017-W6
19	2017, Sep, 26	Structure III - DNN with Attention	2017-W6
20	2017, Sep, 21	Structure II - DNN with Varying Structures	2017-W5
21	2017, Sep, 19	Structure I - Varying DNN structures	2017-W5
22	2017, Sep, 14	Theoretical17 VI - More about Behaviors of DNN	2017-W4
23	2017, Sep, 12	Theoretical17 V - More about Behaviors of DNN	2017-W4
24	2017, Sep, 7	Theoretical17 IV - Investigating Behaviors of DNN	2017-W3
25	2017, Sep, 5	Theoretical17 III - Investigating Behaviors of DNN	2017-W3
26	2017, Aug, 31	Generative I - GAN tutorial by Ian Goodfellow	2017-W2
27	2017, Aug, 29	Reinforcement I - Pineau - RL Basic Concepts	2017-W2
28	2017, Aug, 24	Theoretical17 II - Ganguli - Theoretical Neuroscience and Deep Learning DLSS16	2017-W1
29	2017, Aug, 22	Basic17 -Andrew Ng - Nuts and Bolts of Applying Deep Learning	2017-W1

[1]: RL IV - RL with varying structures

read on: - 30 Nov 2017
6Reinforcement Auxiliary Sampling Value-Networks structured Imitation-Learning Hierarchical

Presenter	Papers	Paper URL	Our Slides
Ceyer	Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 ¹	PDF	PDF
Beilun	Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy ²	PDF	PDF
Ji	Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 ³	PDF	PDF
Xueying	End-to-End Differentiable Adversarial Imitation Learning, ICML17 ⁴	PDF	PDF
	Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17	PDF
	FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 ⁵	PDF

[2]: RL III - Basic tutorial RLSS17 (2)

read on: - 28 Nov 2017
6Reinforcement alphaGO Planning Temporal-Difference

Presenter	Papers	Paper URL	Our Slides
Anant	The Predictron: End-to-End Learning and Planning, ICLR17 ¹	PDF	PDF
ChaoJiang	Szepesvari - Theory of RL ²	RLSS.pdf + Video	PDF
GaoJi	Mastering the game of Go without human knowledge / Nature 2017 ³	PDF	PDF
	Thomas - Safe Reinforcement Learning	RLSS17.pdf + video
	Sutton - Temporal-Difference Learning	RLSS17.pdf + Video

[3]: RL II - Basic tutorial RLSS17

read on: - 21 Nov 2017
6Reinforcement RL Multi-Task

Presenter	Papers	Paper URL	Our Slides
Jack	Hasselt - Deep Reinforcement Learning	RLSS17.pdf + video	PDF
Tianlu	Roux - RL in the Industry	RLSS17.pdf + video	PDF / PDF-Bandit
Xueying	Singh - Steps Towards Continual Learning	pdf + video	PDF
GaoJi	Distral: Robust Multitask Reinforcement Learning ¹	PDF	PDF

[4]: Generative III - GAN training

read on: - 16 Nov 2017
5Generative generative generalization Denoising Model-Criticism

Presenter	Papers	Paper URL	Our Slides
Arshdeep	Generalization and Equilibrium in Generative Adversarial Nets (ICML17) ¹	PDF + video	PDF
Arshdeep	Mode Regularized Generative Adversarial Networks (ICLR17) ²	PDF	PDF
Bargav	Improving Generative Adversarial Networks with Denoising Feature Matching, ICLR17 ³	PDF	PDF
Anant	Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy, ICLR17 ⁴	PDF + code	PDF

[5]: Generative II - Deep Generative Models

read on: - 14 Nov 2017
5Generative generative attention Composition graphical-model Autoregressive structured

Presenter	Papers	Paper URL	Our Slides
ChaoJiang	Courville - Generative Models II	DLSS17Slide + video	PDF
GaoJi	Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 ¹	PDF + talk	PDF
Arshdeep	Composing graphical models with neural networks for structured representations and fast inference, NIPS16 ²	PDF	PDF
	Johnson - Graphical Models and Deep Learning	DLSSSlide + video
	Parallel Multiscale Autoregressive Density Estimation, ICML17 ³	PDF
Beilun	Conditional Image Generation with Pixel CNN Decoders, NIPS16 ⁴	PDF	PDF
Shijia	Marrying Graphical Models & Deep Learning	DLSS17 + Video	PDF

[6]: Optimization IV - change DNN architecture for Optimization

read on: - 09 Nov 2017
4Optimization Forcing Optimization

Presenter	Papers	Paper URL	Our Slides
Shijia	Professor Forcing: A New Algorithm for Training Recurrent Networks, ¹ NIPS16	PDF + Video	PDF
Beilun+Arshdeep	Mollifying Networks, Bengio, ICLR17 ²	PDF	PDF / PDF2

[7]: Optimization III - Optimization for DNN

read on: - 07 Nov 2017
4Optimization Architecture-Search Hyperparameter dynamic Optimization

Presenter	Papers	Paper URL	Our Slides
GaoJi	Forward and Reverse Gradient-Based Hyperparameter Optimization, ICML17 ¹	PDF	PDF
Chaojiang	Adaptive Neural Networks for Efficient Inference, ICML17 ²	PDF	PDF
Bargav	Practical Gauss-Newton Optimisation for Deep Learning, ICML17 ³	PDF	PDF
Rita	How to Escape Saddle Points Efficiently, ICML17 ⁴	PDF	PDF
	Batched High-dimensional Bayesian Optimization via Structural Kernel Learning	PDF

[8]: Optimization II - DNN for Optimization

read on: - 02 Nov 2017
4Optimization Architecture Search RL Few-Shot Optimization

Presenter	Papers	Paper URL	Our Slides
GaoJi	Neural Architecture Search with Reinforcement Learning, ICLR17 ¹	PDF	PDF
Ceyer	Learning to learn ²	DLSS17video	PDF
Beilun	Optimization as a Model for Few-Shot Learning, ICLR17 ³	PDF + More	PDF
Anant	Neural Optimizer Search with Reinforcement Learning, ICML17 ⁴	PDF	PDF

[9]: Optimization I - Understanding DNN Optimization

read on: - 31 Oct 2017
4Optimization optimization Curriculum Differentiation

Presenter	Papers	Paper URL	Our Slides
Ceyer	An overview of gradient optimization algorithms, ¹	PDF	PDF
Shijia	Osborne - Probabilistic numerics for deep learning ²	DLSS 2017 + Video	PDF / PDF2
Jack	Automated Curriculum Learning for Neural Networks, ICML17 ³	PDF	PDF
DLSS17	Johnson - Automatic Differentiation ⁴	slide + video

[10]: Reliable Applications VI - Robustness2

read on: - 26 Oct 2017
3Reliable Adversarial-Examples noise Composition robustness

Presenter	Papers	Paper URL	Our Slides
Tianlu	Robustness of classifiers: from adversarial to random noise, NIPS16	PDF ¹	PDF
Anant	Blind Attacks on Machine Learners, ² NIPS16	PDF	PDF
	Data Noising as Smoothing in Neural Network Language Models (Ng), ICLR17 ³	pdf
	The Robustness of Estimator Composition, NIPS16 ⁴	PDF

[11]: Reliable Applications IV - Robustness

read on: - 23 Oct 2017
3Reliable Adversarial-Examples high-dimensional robustness

Presenter	Papers	Paper URL	Our Slides
GaoJi	Delving into Transferable Adversarial Examples and Black-box Attacks,ICLR17 ¹	pdf	PDF
Shijia	On Detecting Adversarial Perturbations, ICLR17 ²	pdf	PDF
Anant	Parseval Networks: Improving Robustness to Adversarial Examples, ICML17 ³	pdf	PDF
Bargav	Being Robust (in High Dimensions) Can Be Practical, ICML17 ⁴	pdf	PDF

[12]: Reliable Applications III - Interesting Tasks

read on: - 17 Oct 2017
3Reliable QA noise Neural-Programming Hierarchical

Presenter	Papers	Paper URL	Our Slides
Jack	Learning to Query, Reason, and Answer Questions On Ambiguous Texts, ICLR17 ¹	PDF	PDF
Arshdeep	Making Neural Programming Architectures Generalize via Recursion, ICLR17 ²	PDF	PDF
Xueying	Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music, ICLR17 ³	PDF	PDF

[13]: Reliable Applications II - Data privacy

read on: - 12 Oct 2017
3Reliable Semi-supervised Privacy Domain-adaptation

Presenter	Papers	Paper URL	Our Slides
Xueying	Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, ICLR17 ¹	PDF	PDF
Bargav	Deep Learning with Differential Privacy, CCS16 ²	PDF + video	PDF
Bargav	Privacy-Preserving Deep Learning, CCS15 ³	PDF	PDF
Xueying	Domain Separation Networks, NIPS16 ⁴	PDF	PDF

[14]: Reliable Applications V - Understanding2

read on: - 11 Oct 2017
3Reliable visualizing Difference-Analysis Attribution Composition

Presenter	Papers	Paper URL	Our Slides
Rita	Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17 ¹	PDF	PDF
Arshdeep	Axiomatic Attribution for Deep Networks, ICML17 ²	PDF	PDF
	The Robustness of Estimator Composition, NIPS16	PDF

[15]: Reliable Applications I - Understanding

read on: - 10 Oct 2017
3Reliable Interpretable Model-Criticism random Difference-Analysis Attribution

Presenter	Papers	Paper URL	Our Slides
Rita	Learning Important Features Through Propagating Activation Differences, ICML17 ¹	PDF	PDF
GaoJi	Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16 ²	PDF	PDF
Rita	Learning Kernels with Random Features, Aman Sinha*; John Duchi, ³	PDF	PDF

[16]: Structure VI - DNN with Adaptive Structures

read on: - 05 Oct 2017
2Architecture 8Scalable dynamic Architecture Search structured

Presenter	Papers	Paper URL	Our Slides
Anant	AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17 ¹	PDF	PDF
Shijia	SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, ICML17 ²	PDF	PDF
Jack	Proximal Deep Structured Models, NIPS16 ³	PDF	PDF
	Optimal Architectures in a Solvable Model of Deep Networks, NIPS16 ⁴	PDF
Tianlu	Large-Scale Evolution of Image Classifiers, ICML17 ⁵	PDF	PDF

[17]: Structure V - DNN with Attention 3

read on: - 03 Oct 2017
2Architecture dynamic QA memory

Presenter	Papers	Paper URL	Our Slides
Tianlu	Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, ICML17 ¹	PDF + code	PDF
Jack	Reasoning with Memory Augmented Neural Networks for Language Comprehension, ICLR17 ²	PDF	PDF
Xueying	State-Frequency Memory Recurrent Neural Networks, ICML17 ³	PDF	PDF

[18]: Structure IV - DNN with Attention 2

read on: - 28 Sep 2017
2Architecture attention transfer-learning relational generative memory Infomax

Presenter	Papers	Paper URL	Our Slides
Jack	Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 ¹	PDF	PDF
Arshdeep	Bidirectional Attention Flow for Machine Comprehension, ICLR17 ²	PDF + code	PDF
Ceyer	Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17	PDF + code	PDF
ChaoJiang	Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 ³	PDF	PDF
	An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17	PDF

[19]: Structure III - DNN with Attention

read on: - 26 Sep 2017
2Architecture attention transfer-learning dynamic structured QA relational

Presenter	Papers	Paper URL	Our Slides
Rita	Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 ¹	PDF	PDF
Tianlu	Dynamic Coattention Networks For Question Answering, ICLR17 ²	PDF + code	PDF
ChaoJiang	Structured Attention Networks, ICLR17 ³	PDF + code	PDF

[20]: Structure II - DNN with Varying Structures

read on: - 21 Sep 2017
2Architecture 8Scalable sparsity blocking nonparametric structured QA Interpretable

Presenter	Papers	Paper URL	Our Slides
Shijia	Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 ¹	PDF	PDF
Ceyer	Sequence Modeling via Segmentations, ICML17 ²	PDF	PDF
Arshdeep	Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 ³	PDF	PDF

[21]: Structure I - Varying DNN structures

read on: - 19 Sep 2017
2Architecture 8Scalable dialog QA nonparametric structured sparsity

Presenter	Papers	Paper URL	Our Slides
Jack	Learning End-to-End Goal-Oriented Dialog, ICLR17 ¹	PDF	PDF
Bargav	Nonparametric Neural Networks, ICLR17 ²	PDF	PDF
Bargav	Learning Structured Sparsity in Deep Neural Networks, NIPS16 ³	PDF	PDF
Arshdeep	Learning the Number of Neurons in Deep Networks, NIPS16 ⁴	PDF	PDF

[22]: Theoretical17 VI - More about Behaviors of DNN

read on: - 14 Sep 2017
1Theoretical understanding black-box Expressive generalization

Presenter	Papers	Paper URL
SE	Equivariance Through Parameter-Sharing, ICML17 ¹	PDF
SE	Why Deep Neural Networks for Function Approximation?, ICLR17 ²	PDF
SE	Geometry of Neural Network Loss Surfaces via Random Matrix Theory, ³ICML17	PDF
	Sharp Minima Can Generalize For Deep Nets, ICML17 ⁴	PDF

[23]: Theoretical17 V - More about Behaviors of DNN

read on: - 12 Sep 2017
1Theoretical understanding black-box Memorization InfoMax Expressive

Presenter	Papers	Paper URL	Our Slides
Ceyer	A Closer Look at Memorization in Deep Networks, ICML17 ¹	PDF	PDF
	On the Expressive Efficiency of Overlapping Architectures of Deep Learning ²	DLSSpdf + video
Mutual Information	Opening the Black Box of Deep Neural Networks via Information ³	URL + video
ChaoJiang	Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16	PDF	PDF

[24]: Theoretical17 IV - Investigating Behaviors of DNN

read on: - 07 Sep 2017
1Theoretical understanding black-box Parsimonious Associative memory

Presenter	Papers	Paper URL	Our Slides
Beilun	Learning Deep Parsimonious Representations, NIPS16 ¹	PDF	PDF
Jack	Dense Associative Memory for Pattern Recognition, NIPS16 ²	PDF + video	PDF

[25]: Theoretical17 III - Investigating Behaviors of DNN

read on: - 05 Sep 2017
1Theoretical understanding black-box generalization Expressive

Presenter	Papers	Paper URL	Our Slides
Rita	On the Expressive Power of Deep Neural Networks ¹	PDF	PDF
Arshdeep	Understanding deep learning requires rethinking generalization, ICLR17 ²	PDF	PDF
Tianlu	On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR17 ³	PDF	PDF

[26]: Generative I - GAN tutorial by Ian Goodfellow

read on: - 31 Aug 2017
5Generative 0Basics generative GAN

Presenter	Papers	Paper URL	Our Slides
NIPS 2016	ganerative adversarial network tutorial (NIPS 2016)	paper + video + code
DLSS 2017	Generative Models I - DLSS 2017	slideraw + video + slide

[27]: Reinforcement I - Pineau - RL Basic Concepts

read on: - 29 Aug 2017
6Reinforcement 0Basics RL

Pineau - RL Basic Concepts

Presenter	Papers	Paper URL	Our Slides
DLSS16	video
RLSS17	slideRaw + video+ slide

[28]: Theoretical17 II - Ganguli - Theoretical Neuroscience and Deep Learning DLSS16

read on: - 24 Aug 2017
1Theoretical neuroscience visualizing brain

Ganguli - Theoretical Neuroscience and Deep Learning

Presenter	Papers	Paper URL
DLSS16	video
DLSS17	video + slide
DLSS17	Deep learning in the brain	DLSS17 + Video

[29]: Basic17 -Andrew Ng - Nuts and Bolts of Applying Deep Learning

read on: - 22 Aug 2017
0Basics bias-variance

Presenter	Papers	Paper URL	Our Slides
NIPS16	Andrew Ng - Nuts and Bolts of Applying Deep Learning: ¹ video
DLSS17	Doina Precup - Machine Learning - Bayesian Views (56:50m to 1:04:45 slides) video + slide

> In total, we have finished number of 127 reading sessions.

2024-seminarRead 2022-selfRead 2020team 2019sCourse 2019fCourse 2018Reads 2017Reads 2017Course BackTop

Dr. Yanjun Qi

2024-seminarRead

[1]: Safety Benchmark WMDP

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

[2]: KV Cache and Tooling

KV Caching in LLM:

Must know tools for training/finetuning/serving LLM’s -

More readings

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Retentive Network: A Successor to Transformer for Large Language Models

RWKV: Reinventing RNNs for the Transformer Era

[3]: Advanced Transformer Architectures

Required Readings:

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

JAMBA

More readings:

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Efficient Memory Management for Large Language Model Serving with PagedAttention

Attention Mechanisms in Computer Vision: A Survey

[4]: LLM fine tuning

Required Readings:

Recent Large Language Models Reshaping the Open-Source Arena

Instruction Tuning for Large Language Models: A Survey

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models

More readings

Gemini: A Family of Highly Capable Multimodal Models

QLoRA: Efficient Finetuning of Quantized LLMs

related: LoRA: Low-Rank Adaptation of Large Language Models

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

[5]: Recent LLM basics

Require Readings:

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

More Readings:

Sparks of Large Audio Models: A Survey and Outlook

[6]: MultiAgent LLMs

Required Readings:

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

More Readings:

Understanding the planning of LLM agents: A survey

LLM Agents can Autonomously Hack Websites

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

Humanoid Locomotion as Next Token Prediction

[7]: LLM Agents

Required Readings:

A Survey on Large Language Model based Autonomous Agents

More Readings:

Position Paper: Agent AI Towards a Holistic Intelligence

Tool Use in LLMs

Practices for Governing Agentic AI Systems

Emergent autonomous scientific research capabilities of large language models

What Makes a Dialog Agent Useful?

[8]: Self-exam LLM and reasoning

Required Readings:

Augmented Language Models: a Survey

Self-Consistency Improves Chain of Thought Reasoning in Language Models

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

More Readings:

ReAct: Synergizing Reasoning and Acting in Language Models

Towards Reasoning in Large Language Models: A Survey

Large Language Models Can Self-Improve

Orca 2: Teaching Small Language Models How to Reason /

[9]: Prompt Engineering

Required Readings:

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

More Readings:

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts

[10]: LLM Scaling law and Efficiency

Required Readings:

Scaling Laws for Neural Language Models

Efficient Large Language Models: A Survey

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

More Readings:

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

LIMA: Less Is More for Alignment /

[11]: LLM interpretibility, trust and knowledge conflicts

Required Readings: