Click on a term-tag to see relevant list of readings we finished in a certain semester.
read on: - 03 May 2024
FMRisk
Safety
Mitigate
LLMEvaluate
Adversarial
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
- Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
- The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 4,157 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop CUT, a state-of-the-art unlearning method based on controlling model representations. CUT reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at this https URL
read on: - 30 Apr 2024
FMEfficient
Efficiency
KV Caching in LLM:
- grouped query attention: https://arxiv.org/pdf/2305.13245.pdf
- Paged attention https://arxiv.org/pdf/2309.06180.pdf
https://openreview.net/pdf?id=uNrFpDPMyo
-
Torchtune - Build on top of Pytorch, for training and finetuning LLM’s. Uses yaml based configs for easily running experiments. Github -
-
axolotl - Built on top on Huggigface peft and transformer library, supports fine-tuning a large number for models like Mistral, LLama etc. Provides support for techniques like RLHF, DPO, LORA, qLORA etc. Github
-
LitGPT - Build on nanoGPT and Megatron, support pre-training and fine-tuning, has examples like Starcoder, TinyLlama etc. Github -
-
Maxtext - Jax based library for training LLM’s on Google TPU’s with configs for models like Gemma, Mistral and LLama2 etc. Github
-
Langchain- https://python.langchain.com/docs/get_started/introduction
- haystack.deepset.ai
- https://github.com/deepset-ai/haystack
- LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it’s best suited for building RAG, question answering, semantic search or conversational agent chatbots.
- LlamaIndex
- https://docs.llamaindex.ai/en/stable/
LlamaIndex supports Retrieval-Augmented Generation (RAG). Instead of asking LLM to generate an answer immediately, LlamaIndex:
retrieves information from your data sources first, / adds it to your question as context, and / asks the LLM to answer based on the enriched prompt.
- Making Retrieval Augmented Generation Fast
- https://www.pinecone.io/learn/fast-retrieval-augmented-generation/
- OpenMoE
- https://github.com/XueFuzhao/OpenMoE
More readings
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
- Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, Xia Hu
-
This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current GPT- and BERT-style LLMs. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, natural language generation tasks, emergent abilities, and considerations for specific tasks.We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, .
- https://github.com/Mooler0410/LLMsPracticalGuide
- In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation… Show more
Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transfor… Show more
read on: - 25 Apr 2024
FMEfficient
Efficiency
In this session, our readings cover:
Required Readings:
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
- https://arxiv.org/abs/2311.12351
- Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current LLMs are predominantly pretrained on short text snippets, which compromises their effectiveness in processing the long-context prompts that are frequently encountered in practical scenarios. This article offers a comprehensive survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs throughout the entire model lifecycle, from pre-training through to inference. We first delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. We then provide a taxonomy and the landscape of upgrades on Transformer architecture to solve these problems. Afterwards, we provide an investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as optimization toolkits such as libraries, frameworks, and compilers to boost the efficacy of LLMs across different stages in runtime. Finally, we discuss the challenges and potential avenues for future research. A curated repository of relevant literature, continuously updated, is available at this https URL.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
- Paper: https://arxiv.org/abs/2205.14135
-
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware – accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3$\times$ speedup on GPT-2 (seq. length 1K), and 2.4$\times$ speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).
- Related: blogpost FlashAttention
— Techniques for Efficient Inference
of LLMs (III/IV)
JAMBA
- Introducing Jamba: AI21’s Groundbreaking SSM-Transformer Model
Debuting the first production-grade Mamba-based model delivering best-in-class quality and performance.
- March 28, 2024
- https://www.ai21.com/blog/announcing-jamba
- We are thrilled to announce Jamba, the world’s first production-grade Mamba based model. By enhancing Mamba Structured State Space model (SSM) technology with elements of the traditional Transformer architecture, Jamba compensates for the inherent limitations of a pure SSM model. Offering a 256K context window, it is already demonstrating remarkable gains in throughput and efficiency—just the beginning of what can be possible with this innovative hybrid architecture. Notably, Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.
More readings:
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- Albert Gu, Tri Dao
- https://arxiv.org/abs/2312.00752
- Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers’ computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.
Efficient Memory Management for Large Language Model Serving with PagedAttention
- Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
- High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address this problem, we propose PagedAttention, an attention algorithm inspired by the classical virtual memory and paging techniques in operating systems. On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage. Our evaluations show that vLLM improves the throughput of popular LLMs by 2-4× with the same level of latency compared to the state-of-the-art systems, such as FasterTransformer and Orca. The improvement is more pronounced with longer sequences, larger models, and more complex decoding algorithms. vLLM’s source code is publicly available at this https URL
Attention Mechanisms in Computer Vision: A Survey
- Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu
- https://arxiv.org/abs/2111.07624
- Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention; a related repository this https URL is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
read on: - 23 Apr 2024
FMAdapt
Alignment
In this session, our readings cover:
Required Readings:
Recent Large Language Models Reshaping the Open-Source Arena
- https://deci.ai/blog/list-of-large-language-models-in-open-source/
- The release of Meta’s Llama model and the subsequent release of Llama 2 in 2023 kickstarted an explosion of open-source language models, with better and more innovative models being released on what seems like a daily basis. With new open-source models being released on a daily basis, here we dove into the ocean of open-source possibilities to curate a select list of the most intriguing and influential models making waves in recent months, inlcuding Qwen1.5/ Yi/ Smaug/ Mixtral-8x7B-v0.1/ DBRX/ SOLAR-10.7B-v1.0 / Tulu 2 / WizardLM/ Starling 7B/ OLMo-7B/ Gemma and DeciLM-7B.
- Plus the newly avaiable DBRX model https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Instruction Tuning for Large Language Models: A Survey
- https://arxiv.org/abs/2308.10792
- Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang
- This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users’ objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research. Project page: this http URL
Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models
- https://arxiv.org/abs/2203.06904
- Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divide existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning.
More readings
Gemini: A Family of Highly Capable Multimodal Models
- https://arxiv.org/abs/2312.11805
- This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.
QLoRA: Efficient Finetuning of Quantized LLMs
- Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer
We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.
- https://arxiv.org/abs/2106.09685
- An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at this https URL.
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
- https://arxiv.org/abs/2401.00788
- Terry Yue Zhuo, Armel Zebaze, Nitchakarn Suppattarachai, Leandro von Werra, Harm de Vries, Qian Liu, Niklas Muennighoff
- The high cost of full-parameter fine-tuning (FFT) of Large Language Models (LLMs) has led to a series of parameter-efficient fine-tuning (PEFT) methods. However, it remains unclear which methods provide the best cost-performance trade-off at different model scales. We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters. Through investigations across 5 tasks and 8 different datasets encompassing both code comprehension and code generation tasks, we find that FFT generally leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale. LoRA usually offers the most favorable trade-off between cost and performance. Further investigation into the effects of these methods on both model robustness and code security reveals that larger models tend to demonstrate reduced robustness and less security. At last, we explore the relationships among updated parameters, cross-entropy loss, and task performance. We find that the tuning effectiveness observed in small models generalizes well to larger models, and the validation loss in instruction tuning can be a reliable indicator of overall downstream performance.
This site was built using GitHub Pages.
read on: - 18 Apr 2024
FMEfficient
Efficiency
BasicLLM
In this session, our readings cover:
Require Readings:
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
- https://arxiv.org/abs/2312.15234
- In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- https://arxiv.org/abs/2304.01373
- How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at \url{this https URL}.
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
- https://arxiv.org/abs/2403.09611
- Multimodal LLM Pre-training - provides a comprehensive overview of methods, analysis, and insights into multimodal LLM pre-training; studies different architecture components and finds that carefully mixing image-caption, interleaved image-text, and text-only data is key for state-of-the-art performance; it also proposes a family of multimodal models up to 30B parameters that achieve SOTA in pre-training metrics and include properties such as enhanced in-context learning, multi-image reasoning, enabling few-shot chain-of-thought prompting.
More Readings:
Sparks of Large Audio Models: A Survey and Outlook
- Siddique Latif, Moazzam Shoukat, Fahad Shamshad, Muhammad Usama, Yi Ren, Heriberto Cuayáhuitl, Wenwu Wang, Xulong Zhang, Roberto Togneri, Erik Cambria, Björn W. Schuller
- This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources–from human voices to musical instruments and environmental sounds–poses challenges distinct from those found in traditional Natural Language Processing scenarios. Nevertheless, \textit{Large Audio Models}, epitomized by transformer-based architectures, have shown marked efficacy in this sphere. By leveraging massive amount of data, these models have demonstrated prowess in a variety of audio tasks, spanning from Automatic Speech Recognition and Text-To-Speech to Music Generation, among others. Notably, recently these Foundational Audio Models, like SeamlessM4T, have started showing abilities to act as universal translators, supporting multiple speech tasks for up to 100 languages without any reliance on separate task-specific systems. This paper presents an in-depth analysis of state-of-the-art methodologies regarding \textit{Foundational Large Audio Models}, their performance benchmarks, and their applicability to real-world scenarios. We also highlight current limitations and provide insights into potential future research directions in the realm of \textit{Large Audio Models} with the intent to spark further discussion, thereby fostering innovation in the next generation of audio-processing systems. Furthermore, to cope with the rapid development in this area, we will consistently update the relevant repository with relevant recent articles and their open-source implementations at this https URL.
read on: - 16 Apr 2024
FMAdapt
Agent
In this session, our readings cover:
Required Readings:
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
- Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang
- Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents’ capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.
More Readings:
Understanding the planning of LLM agents: A survey
- https://arxiv.org/abs/2402.02716
- As Large Language Models (LLMs) have shown significant intelligence, the progress to leverage LLMs as planning modules of autonomous agents has attracted more attention. This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. We provide a taxonomy of existing works on LLM-Agent planning, which can be categorized into Task Decomposition, Plan Selection, External Module, Reflection and Memory. Comprehensive analyses are conducted for each direction, and further challenges for the field of research are discussed.
LLM Agents can Autonomously Hack Websites
- Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang
- In recent years, large language models (LLMs) have become increasingly capable and can now interact with tools (i.e., call functions), read documents, and recursively call themselves. As a result, these LLMs can now function autonomously as agents. With the rise in capabilities of these agents, recent work has speculated on how LLM agents would affect cybersecurity. However, not much is known about the offensive capabilities of LLM agents. In this work, we show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. Importantly, the agent does not need to know the vulnerability beforehand. This capability is uniquely enabled by frontier models that are highly capable of tool use and leveraging extended context. Namely, we show that GPT-4 is capable of such hacks, but existing open-source models are not. Finally, we show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild. Our findings raise questions about the widespread deployment of LLMs.
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
- Zehui Chen, Kuikun Liu, Qiuchen Wang, Wenwei Zhang, Jiangning Liu, Dahua Lin, Kai Chen, Feng Zhao
- Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. How to integrate agent ability into general LLMs becomes a crucial and urgent problem. This paper first delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. Based on the above findings, we propose Agent-FLAN to effectively Fine-tune LANguage models for Agents. Through careful decomposition and redesign of the training corpus, Agent-FLAN enables Llama2-7B to outperform prior best works by 3.5\% across various agent evaluation datasets. With comprehensively constructed negative samples, Agent-FLAN greatly alleviates the hallucination issues based on our established evaluation benchmark. Besides, it consistently improves the agent capability of LLMs when scaling model sizes while slightly enhancing the general capability of LLMs. The code will be available at this https URL.
Humanoid Locomotion as Next Token Prediction
- Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik
- We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This general formulation enables us to leverage data with missing modalities, like video trajectories without actions. We train our model on a collection of simulated trajectories coming from prior neural network policies, model-based controllers, motion capture data, and YouTube videos of humans. We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot. Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize to commands not seen during training like walking backward. These findings suggest a promising path toward learning challenging real-world control tasks by generative modeling of sensorimotor trajectories.
read on: - 11 Apr 2024
FMAdapt
Agent
Required Readings:
A Survey on Large Language Model based Autonomous Agents
- https://arxiv.org/abs/2308.11432
- Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective. More specifically, we first discuss the construction of LLM-based autonomous agents, for which we propose a unified framework that encompasses a majority of the previous work. Then, we present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository of relevant references at this https URL.
More Readings:
Position Paper: Agent AI Towards a Holistic Intelligence
- https://arxiv.org/abs/2403.00833
- Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao
- Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize developing Agent AI – an embodied system that integrates large foundation models into agent actions. The emerging field of Agent AI spans a wide range of existing embodied and agent-based multimodal interactions, including robotics, gaming, and healthcare systems, etc. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model. On top of this idea, we discuss how agent AI exhibits remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Furthermore, we discuss the potential of Agent AI from an interdisciplinary perspective, underscoring AI cognition and consciousness within scientific discourse. We believe that those discussions serve as a basis for future research directions and encourage broader societal engagement.
- https://zorazrw.github.io/files/WhatAreToolsAnyway.pdf
- an overview of tool use in LLMs, including a formal definition of the tool-use paradigm, scenarios where LLMs leverage tool usage, and for which tasks this approach works well; it also provides an analysis of complex tool usage and summarize testbeds and evaluation metrics across LM tooling works
Practices for Governing Agentic AI Systems
- https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf
- Agentic AI systems—AI systems that can pursue complex goals with limited direct supervision— are likely to be broadly useful if we can integrate them responsibly into our society. While such systems have substantial potential to help people more efficiently and effectively achieve their own goals, they also create risks of harm. In this white paper, we suggest a definition of agentic AI systems and the parties in the agentic AI system life-cycle, and highlight the importance of agreeing on a set of baseline responsibilities and safety best practices for each of these parties. As our primary contribution, we offer an initial set of practices for keeping agents’ operations safe and accountable, which we hope can serve as building blocks in the development of agreed baseline best practices. We enumerate the questions and uncertainties around operationalizing each of these practices that must be addressed before such practices can be codified. We then highlight categories of indirect impacts from the wide-scale adoption of agentic AI systems, which are likely to necessitate additional governance frameworks.
Emergent autonomous scientific research capabilities of large language models
- https://arxiv.org/abs/2304.05332
- Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent’s scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse.
What Makes a Dialog Agent Useful?
- https://huggingface.co/blog/dialog-agents
read on: - 09 Apr 2024
FMAdapt
Reasoning
In this session, our readings cover:
Required Readings:
Augmented Language Models: a Survey
- Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom
- This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability,
Self-Consistency Improves Chain of Thought Reasoning in Language Models
- https://arxiv.org/abs/2203.11171
- Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
- https://arxiv.org/abs/2401.00812
- Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi R. Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, Chengxiang Zhai
- The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs’ training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code.
More Readings:
ReAct: Synergizing Reasoning and Acting in Language Models
- Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
- While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: this https URL
- Comments: v3 is the ICLR camera ready version with some typos fixed. Project site with code: this https URL
Towards Reasoning in Large Language Models: A Survey
- Jie Huang, Kevin Chen-Chuan Chang
- Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.
Comments: ACL 2023 Findings, 15 pages
Large Language Models Can Self-Improve
- Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han / Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate “high-confidence” rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%->82.1% on GSM8K, 78.2%->83.0% on DROP, 90.0%->94.4% on OpenBookQA, and 63.4%->67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that fine-tuning on reasoning is critical for self-improvement.
- https://arxiv.org/abs/2210.11610
Orca 2: Teaching Small Language Models How to Reason /
- https://arxiv.org/abs/2311.11045
- Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs’ reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. make Orca 2 weights publicly available at this http URL to support research on the development, evaluation, and alignment of smaller LMs
read on: - 04 Apr 2024
FMAdapt
Prompting
In this session, our readings cover:
Required Readings:
Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review
- https://arxiv.org/abs/2310.14735
- Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu / This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting, as well as more advanced methodologies such as the chain-of-thought and tree-of-thoughts prompting. The paper sheds light on how external assistance in the form of plugins can assist in this task, and reduce machine hallucination by retrieving external knowledge. We subsequently delineate prospective directions in prompt engineering research, emphasizing the need for a deeper understanding of structures and the role of agents in Artificial Intelligence-Generated Content (AIGC) tools. We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods. Finally, we gather information about the application of prompt engineering in such fields as education and programming, showing its transformative potential. This comprehensive survey aims to serve as a friendly guide for anyone venturing through the big world of LLMs and prompt engineering.
More Readings:
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
- This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One of the major causes of the high generation latency is the sequential decoding approach adopted by almost all state-of-the-art LLMs. In this work, motivated by the thinking and writing process of humans, we propose Skeleton-of-Thought (SoT), which first guides LLMs to generate the skeleton of the answer, and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point in parallel. Not only does SoT provide considerable speed-ups across 12 LLMs, but it can also potentially improve the answer quality on several question categories. SoT is an initial attempt at data-centric optimization for inference efficiency, and further underscores the potential of pushing LLMs to think more like a human for answer quality.
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts
- The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models’ (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM’s capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and others parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques.
read on: - 02 Apr 2024
FMEfficient
Efficiency
In this session, our readings cover:
Required Readings:
Scaling Laws for Neural Language Models
- Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei
-
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
- https://github.com/RUCAIBox/LLMSurvey
Efficient Large Language Models: A Survey
- https://arxiv.org/abs/2312.03863
- https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey
- Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency this http URL this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we compile the papers featured in this survey at this https URL, and will actively maintain this repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- Recent research, such as BitNet [23], is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.
More Readings:
An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing
- Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang
- We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user’s perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.
LIMA: Less Is More for Alignment /
- https://arxiv.org/abs/2305.11206
- Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.
read on: - 28 Mar 2024
FMRisk
Interpretibility
Required Readings:
Rethinking interpretability in the era of large language models
- Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao
- 2024/1/30
- Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations.
The Claude 3 Model Family: Opus, Sonnet, Haiku
- https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf
- We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed,
and Claude 3 Haiku, our fastest and least expensive model. All new models have vision
capabilities that enable them to process and analyze image data. The Claude 3 family
demonstrates strong performance across benchmark evaluations and sets a new standard on
measures of reasoning, math, and coding. Claude 3 Opus achieves state-of-the-art results
on evaluations like GPQA [1], MMLU [2], MMMU [3] and many more. Claude 3 Haiku
performs as well or better than Claude 2 [4] on most pure-text tasks, while Sonnet and
Opus significantly outperform it. Additionally, these models exhibit improved fluency in
non-English languages, making them more versatile for a global audience. In this report,
we provide an in-depth analysis of our evaluations, focusing on core capabilities, safety,
societal impacts, and the catastrophic risk assessments we committed to in our Responsible
Scaling Policy [5].
More Readings:
Knowledge Conflicts for LLMs: A Survey
- https://arxiv.org/abs/2403.08319
- This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness
- https://github.com/openai/transformer-debugger
- Transformer Debugger (TDB) is a tool developed by OpenAI’s Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders. TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, “Why does the model output token A instead of token B for this prompt?” or “Why does attention head H attend to token T for this prompt?” It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits.
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
- https://transformer-circuits.pub/2023/monosemantic-features/index.html
- In this paper, we use a weak dictionary learning algorithm called a sparse autoencoder to generate learned features from a trained model that offer a more monosemantic unit of analysis than the model’s neurons themselves. Our approach here builds on a significant amount of prior work, especially in using dictionary learning and related methods on neural network activations , and a more general allied literature on disentanglement. We also note interim reports which independently investigated the sparse autoencoder approach in response to Toy Models, culminating in the recent manuscript of Cunningham et al.
- related post: Decomposing Language Models Into Understandable Components https://www.anthropic.com/news/decomposing-language-models-into-understandable-components
Tracing Model Outputs to the Training Data
- https://www.anthropic.com/news/influence-functions
- As large language models become more powerful and their risks become clearer, there is increasing value to figuring out what makes them tick. In our previous work, we have found that large language models change along many personality and behavioral dimensions as a function of both scale and the amount of fine-tuning. Understanding these changes requires seeing how models work, for instance to determine if a model’s outputs rely on memorization or more sophisticated processing. Understanding the inner workings of language models will have substantial implications for forecasting AI capabilities as well as for approaches to aligning AI systems with human preferences.
Mechanistic interpretability takes a bottom-up approach to understanding ML models: understanding in detail the behavior of individual units or small-scale circuits such as induction heads. But we also see value in a top-down approach, starting with a model’s observable behaviors and generalization patterns and digging down to see what neurons and circuits are responsible. An advantage of working top-down is that we can directly study high-level cognitive phenomena of interest which only arise at a large scale, such as reasoning and role-playing. Eventually, the two approaches should meet in the middle.
Language models can explain neurons in language models
- https://openai.com/research/language-models-can-explain-neurons-in-language-models
- Language models have become more capable and more widely deployed, but we do not understand how they work. Recent work has made progress on understanding a small number of circuits and narrow behaviors,[1][2] but to fully understand a language model, we’ll need to analyze millions of neurons. This paper applies automation to the problem of scaling an interpretability technique to all the neurons in a large language model. Our hope is that building on this approach of automating interpretability [3][4][5] will enable us to comprehensively audit the safety of models before deployment.
read on: - 26 Mar 2024
FMAdapt
ModelEdit
In this session, our readings cover:
Required Readings:
Editing Large Language Models: Problems, Methods, and Opportunities
- https://arxiv.org/abs/2305.13172
- Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang
Despite the ability to train capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To this end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to efficiently alter the behavior of LLMs within a specific domain without negatively impacting performance across other inputs. This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. In particular, we provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. We also build a new benchmark dataset to facilitate a more robust evaluation and pinpoint enduring issues intrinsic to existing techniques. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context. Code and datasets are available at this https URL.
Comments: EMNLP 2023. Updated with new experiments
More Readings:
Tuning Language Models by Proxy
- Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
- Submitted on 16 Jan 2024]
- Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors. However, tuning these models has become increasingly resource-intensive, or impossible when model weights are private. We introduce proxy-tuning, a lightweight decoding-time algorithm that operates on top of black-box LMs to achieve the result of directly tuning the model, but by accessing only its prediction over the output vocabulary. Our method instead tunes a smaller LM, then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the base model in the direction of tuning, while retaining the benefits of larger scale pretraining. In experiments, when we apply proxy-tuning to Llama2-70B using proxies of only 7B size, we can close 88% of the gap between Llama2-70B and its truly-tuned chat version, when evaluated across knowledge, reasoning, and safety benchmarks. Interestingly, when tested on TruthfulQA, proxy-tuned models are actually more truthful than directly tuned models, possibly because decoding-time guidance better retains the model’s factual knowledge. We then demonstrate the generality of proxy-tuning by applying it for domain adaptation on code, and task-specific finetuning on question-answering and math problems. Our work demonstrates the promise of using small tuned LMs to efficiently customize large, potentially proprietary LMs through decoding-time guidance.
A Survey of Machine Unlearning
- https://arxiv.org/abs/2209.02299
- Today, computer systems hold large amounts of personal data. Yet while such an abundance of data allows breakthroughs in artificial intelligence, and especially machine learning (ML), its existence can be a threat to user privacy, and it can weaken the bonds of trust between humans and AI. Recent regulations now require that, on request, private information about a user must be removed from both computer systems and from ML models, i.e. ``the right to be forgotten’’). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often `remember’ the old data. Contemporary adversarial attacks on trained models have proven that we can learn whether an instance or an attribute belonged to the training data. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to completely solve the problem due to the lack of common frameworks and resources. Therefore, this paper aspires to present a comprehensive examination of machine unlearning’s concepts, scenarios, methods, and applications. Specifically, as a category collection of cutting-edge studies, the intention behind this article is to serve as a comprehensive resource for researchers and practitioners seeking an introduction to machine unlearning and its formulations, design criteria, removal requests, algorithms, and applications. In addition, we aim to highlight the key findings, current trends, and new research areas that have not yet featured the use of machine unlearning but could benefit greatly from it. We hope this survey serves as a valuable resource for ML researchers and those seeking to innovate privacy technologies. Our resources are publicly available at this https URL.
AI Model Disgorgement: Methods and Choices
- https://arxiv.org/abs/2304.03545
- Alessandro Achille, Michael Kearns, Carson Klingenberg, Stefano Soatto
Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement – the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of “removing the effects” of data in the trained model in a way that does not require retraining from scratch.
read on: - 21 Mar 2024
FMAdapt
DomainAdapt
In this session, our readings cover:
Required Readings:
Large Language Models for Software Engineering: A Systematic Literature Review
- Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We collect and analyze 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study.
More Readings:
Large language models generate functional protein sequences across diverse families
- https://pubmed.ncbi.nlm.nih.gov/36702895/
- Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase.
Large Language Models in Law: A Survey
- https://arxiv.org/abs/2312.03718
- The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges.
ChemLLM: A Chemical Large Language Model
- https://arxiv.org/abs/2402.06852
- Large language models (LLMs) have made impressive progress in chemistry applications, including molecular property prediction, molecular generation, experimental protocol design, etc. However, the community lacks a dialogue-based model specifically designed for chemistry. The challenge arises from the fact that most chemical data and scientific knowledge are primarily stored in structured databases, and the direct use of these structured data compromises the model’s ability to maintain coherent dialogue. To tackle this issue, we develop a novel template-based instruction construction method that transforms structured knowledge into plain dialogue, making it suitable for language model traini…
FunSearch: Making new discoveries in mathematical sciences using Large Language Models
- https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/
- https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/
Segment Anything
- https://arxiv.org/abs/2304.02643
- We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at this https URL to foster research into foundation models for computer vision.
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
- In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues, we propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations. Experimental results demonsrate that EMO is able to produce not only convincing speaking videos but also singing videos in various styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism.
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
- Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun
- Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model’s background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora’s development and investigate the underlying technologies used to build this “world simulator”. Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.
BloombergGPT: A Large Language Model for Finance
- https://arxiv.org/abs/2303.17564
- The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg’s extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
- https://arxiv.org/abs/2311.10709
- We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions–adjusted noise schedules for diffusion, and multi-stage training–that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work–81% vs. Google’s Imagen Video, 90% vs. Nvidia’s PYOCO, and 96% vs. Meta’s Make-A-Video. Our model outperforms commercial solutions such as RunwayML’s Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user’s text prompt, where our generations are preferred 96% over prior work.
read on: - 19 Mar 2024
FMRisk
Hallucination
In this session, our readings cover:
Required Readings:
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
- https://arxiv.org/abs/2311.05232
- The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of LLMs in real-world scenarios, which attracts increasing attention to detect and mitigate these hallucinations. In this survey, we aim to provide a thorough and in-depth overview of recent advances in the field of LLM hallucinations. We begin with an innovative taxonomy of LLM hallucinations, then delve into the factors contributing to hallucinations. Subsequently, we present a comprehensive overview of hallucination detection methods and benchmarks. Additionally, representative approaches designed to mitigate hallucinations are introduced accordingly. Finally, we analyze the challenges that highlight the current limitations and formulate open questions, aiming to delineate pathways for future research on hallucinations in LLMs.
More Readings:
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
- https://arxiv.org/abs/2305.14540
- With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency detection compared to traditional non-LLM methods. However, a closer analysis reveals that most LLMs fail on more complex formulations of the task and exposes issues with existing evaluation benchmarks, affecting evaluation precision. To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits. This new benchmark is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, as we estimate inter-annotator agreement at about 0.9. Most LLMs struggle on SummEdits, with performance close to random chance. The best-performing model, GPT-4, is still 8\% below estimated human performance, highlighting the gaps in LLMs’ ability to reason about facts and detect inconsistencies when they occur.
Survey of Hallucination in Natural Language Generation
- https://arxiv.org/abs/2202.03629
- Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Ho Shu Chan, Wenliang Dai, Andrea Madotto, Pascale Fung
- Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation; and (3) hallucinations in large language models (LLMs). This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
Do Language Models Know When They’re Hallucinating References?
- https://arxiv.org/abs/2305.18248
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment
- https://arxiv.org/abs/2308.05374
read on: - 14 Mar 2024
FMAdapt
RAG
In this session, our readings cover:
Required Readings:
Retrieval-Augmented Generation for AI-Generated Content: A Survey
- https://arxiv.org/abs/2402.19473v1
- The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, scalable foundation model architectures, and the availability of ample high-quality datasets. While AIGC has achieved remarkable performance, it still faces challenges, such as the difficulty of maintaining up-to-date and long-tail knowledge, the risk of data leakage, and the high costs associated with training and inference. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances AIGC results by retrieving relevant objects from available data stores, leading to greater accuracy and robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator. We distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Project: this https URL
Retrieval-Augmented Generation for Large Language Models: A Survey
- https://arxiv.org/abs/2312.10997
- Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in practical applications, such as hallucinations, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from external knowledge bases before answering questions with LLMs. RAG has been demonstrated to significantly enhance answer accuracy, reduce model hallucination, particularly for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model outputs. It also facilitates knowledge updates and the introduction of domain-specific knowledge. RAG effectively combines the parameterized knowledge of LLMs with non-parameterized external knowledge bases, making it one of the most important methods for implementing large language models. This paper outlines the development paradigms of RAG in the era of LLMs, summarizing three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It then provides a summary and organization of the three main components of RAG: retriever, generator, and augmentation methods, along with key technologies in each component. Furthermore, it discusses how to evaluate the effectiveness of RAG models, introducing two evaluation methods for RAG, emphasizing key metrics and abilities for evaluation, and presenting the latest automatic evaluation framework. Finally, potential future research directions are introduced from three aspects: vertical optimization, horizontal scalability, and the technical stack and ecosystem of RAG.
More Readings:
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
- Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun
- Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model’s background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora’s development and investigate the underlying technologies used to build this “world simulator”. Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.
A Comprehensive Study of Knowledge Editing for Large Language Models
- https://arxiv.org/abs/2401.01286
- Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs’ behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.
Even More
A Survey of Table Reasoning with Large Language Models
- Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, Wanxiang Che
- https://arxiv.org/abs/2402.08259
- Table reasoning, which aims to generate the corresponding answer to the question following the user requirement according to the provided table, and optionally a text description of the table, effectively improving the efficiency of obtaining information. Recently, using Large Language Models (LLMs) has become the mainstream method for table reasoning, because it not only significantly reduces the annotation cost but also exceeds the performance of previous methods. However, existing research still lacks a summary of LLM-based table reasoning works. Due to the existing lack of research, questions about which techniques can improve table reasoning performance in the era of LLMs, why LLMs excel at table reasoning, and how to enhance table reasoning abilities in the future, remain largely unexplored. This gap significantly limits progress in research. To answer the above questions and advance table reasoning research with LLMs, we present this survey to analyze existing research, inspiring future work. In this paper, we analyze the mainstream techniques used to improve table reasoning performance in the LLM era, and the advantages of LLMs compared to pre-LLMs for solving table reasoning. We provide research directions from both the improvement of existing methods and the expansion of practical applications to inspire future research.
read on: - 12 Mar 2024
FMRisk
Safety
Adversarial
In this session, our readings cover:
Required Readings:
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
- https://dl.acm.org/doi/10.1145/3442188.3445922
- The past 3 years of work in NLP have been characterized by the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries of the possible both through architectural innovations and through sheer size. Using these pretrained models and the methodology of fine-tuning them for specific tasks, researchers have extended the state of the art on a wide array of tasks as measured by leaderboards on specific benchmarks for English. In this paper, we take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models.
More Readings:
Low-Resource Languages Jailbreak GPT-4
- AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4’s safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs’ safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
- https://arxiv.org/abs/2305.11391
- Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.
Even More
ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation / EMNLP2023
- Despite remarkable advances that large language models have achieved in chatbots nowadays, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media contents, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark constructed based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference when compared to social media contents. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.
OpenAI on LLM generated bio-x-risk
- Building an early warning system for LLM-aided biological threat creation
- https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation
A misleading open letter about sci-fi AI dangers ignores the real risks
https://www.aisnakeoil.com/p/a-misleading-open-letter-about-sci
Evaluating social and ethical risks from generative AI
- https://deepmind.google/discover/blog/evaluating-social-and-ethical-risks-from-generative-ai/
Managing Existential Risk from AI without Undercutting Innovation
- https://www.csis.org/analysis/managing-existential-risk-ai-without-undercutting-innovation
read on: - 29 Feb 2024
FMRisk
Safety
Adversarial
In this session, our readings cover:
Required Readings:
Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors
- Dingcheng Yang, Yang Bai, Xiaojun Jia, Yang Liu, Xiaochun Cao, Wenjian Yu
- Diffusion models have been widely deployed in various image generation tasks, demonstrating an extraordinary connection between image and text modalities. However, they face challenges of being maliciously exploited to generate harmful or sensitive images by appending a specific suffix to the original prompt. Existing works mainly focus on using single-modal information to conduct attacks, which fails to utilize multi-modal features and results in less than satisfactory performance. Integrating multi-modal priors (MMP), i.e. both text and image features, we propose a targeted attack method named MMP-Attack in this work. Specifically, the goal of MMP-Attack is to add a target object into the image content while simultaneously removing the original object. The MMP-Attack shows a notable advantage over existing works with superior universality and transferability, which can effectively attack commercial text-to-image (T2I) models such as DALL-E 3. To the best of our knowledge, this marks the first successful attempt of transfer-based attack to commercial T2I models. Our code is publicly available at ….
A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
- https://ieeexplore.ieee.org/document/10208563
- Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem ‘query-free attack generation’. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content.
More Readings:
Visual Instruction Tuning
- Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
- Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
- https://arxiv.org/abs/2401.01523
- https://arxiv.org/abs/2310.03185
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
- https://arxiv.org/abs/2209.07858
read on: - 27 Feb 2024
FMRisk
Safety
Adversarial
In this session, our readings cover:
Required Readings:
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- https://arxiv.org/abs/2402.04249
- Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at this https URL.
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
- https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training
- Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.
More Readings:
SafeText: A Benchmark for Exploring Physical Safety in Language Models
- https://arxiv.org/abs/2210.10045
- Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe. One such type of safety that has been scarcely studied is commonsense physical safety, i.e. text that is not explicitly violent and requires additional commonsense knowledge to comprehend that it leads to physical harm. We create the first benchmark dataset, SafeText, comprising real-life scenarios with paired safe and physically unsafe pieces of advice. We utilize SafeText to empirically study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks. We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice. As a result, we argue for further studies of safety and the assessment of commonsense physical safety in models before release.
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
- https://arxiv.org/abs/2310.03693
Lessons learned on language model safety and misuse
- https://openai.com/research/language-model-safety-and-misuse
Planning red teaming for large language models (LLMs) and their applications
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/red-teaming
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
- https://arxiv.org/abs/2310.09624
read on: - 22 Feb 2024
FMRisk
Bias
Adversarial
In this session, our readings cover:
Required Readings:
Evaluating and Mitigating Discrimination in Language Model Decisions
- https://arxiv.org/abs/2312.03689
- As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks. We present a method for proactively evaluating the potential discriminatory impact of LMs in a wide range of use cases, including hypothetical use cases where they have not yet been deployed. Specifically, we use an LM to generate a wide array of potential prompts that decision-makers may input into an LM, spanning 70 diverse decision scenarios across society, and systematically vary the demographic information in each prompt. Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied. While we do not endorse or permit the use of language models to make automated decisions for the high-risk use cases we study, we demonstrate techniques to significantly decrease both positive and negative discrimination through careful prompt engineering, providing pathways toward safer deployment in use cases where they may be appropriate. Our work enables developers and policymakers to anticipate, measure, and address discrimination as language model capabilities and applications continue to expand. We release our dataset and prompts at this https URL
More Readings:
Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models
- https://arxiv.org/abs/2310.11079
Machine Learning in development: Let’s talk about bias!
- https://huggingface.co/blog/ethics-soc-2
- https://huggingface.co/blog/evaluating-llm-bias
Exploring Social Bias in Chatbots using Stereotype Knowledge WNLP@ACL2019
Bias and Fairness in Large Language Models: A Survey
- https://arxiv.org/abs/2309.00770
- Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.
A Survey on Fairness in Large Language Models
- https://arxiv.org/abs/2308.10149
- Large language models (LLMs) have shown powerful performance and development prospect and are widely deployed in the real world. However, LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks. Unfair LLM systems have undesirable social impacts and potential harms. In this paper, we provide a comprehensive review of related research on fairness in LLMs. First, for medium-scale LLMs, we introduce evaluation metrics and debiasing methods from the perspectives of intrinsic bias and extrinsic bias, respectively. Then, for large-scale LLMs, we introduce recent fairness research, including fairness evaluation, reasons for bias, and debiasing methods. Finally, we discuss and provide insight on the challenges and future directions for the development of fairness in LLMs.
</div>
read on: - 20 Feb 2024
FMRisk
Mitigate
LLMEvaluate
Adversarial
In this session, our readings cover:
Required Readings:
- https://arxiv.org/abs/2205.12628
- Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang
Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner’s name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.
Privacy Risks of General-Purpose Language Models
- https://ieeexplore.ieee.org/abstract/document/9152761
- We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, different…
More Readings:
Privacy in Large Language Models: Attacks, Defenses and Future Directions
- https://arxiv.org/abs/2310.10383
- The advancement of large language models (LLMs) has significantly enhanced the ability to effectively tackle various downstream NLP tasks and unify these tasks into generative pipelines. On the one hand, powerful language models, trained on massive textual data, have brought unparalleled accessibility and usability for both models and users. On the other hand, unrestricted access to these models can also introduce potential malicious and unintentional privacy risks. Despite ongoing efforts to address the safety and privacy concerns associated with LLMs, the problem remains unresolved. In this paper, we provide a comprehensive analysis of the current privacy attacks targeting LLMs and categorize them according to the adversary’s assumed capabilities to shed light on the potential vulnerabilities present in LLMs. Then, we present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks. Beyond existing works, we identify upcoming privacy concerns as LLMs evolve. Lastly, we point out several potential avenues for future exploration.
ProPILE: Probing Privacy Leakage in Large Language Models
- https://arxiv.org/abs/2307.01881
- Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh
The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.
read on: - 15 Feb 2024
FMRisk
Mitigate
LLMEvaluate
Adversarial
In this session, our readings cover:
Required Readings:
Foundation Models and Fair Use
- Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, Percy Liang
- URL
- Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.
- Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
- Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
- https://arxiv.org/abs/2303.04226
- Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.
More Readings:
Audio Deepfake Detection: A Survey
- https://arxiv.org/abs/2308.14970
- Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results.
Copyright Plug-in Market for The Text-to-Image Copyright Protection
- https://openreview.net/forum?id=pSf8rrn49H
- The images generated by text-to-image models could be accused of the copyright infringement, which has aroused heated debate among AI developers, content creators, legislation department and judicature department. Especially, the state-of-the-art text-to-image models are capable of generating extremely high-quality works while at the same time lack the ability to attribute credits to the original creators, which brings anxiety to the artists’ community. In this paper, we propose a conceptual framework – copyright Plug-in Market – to address the tension between the users, the content creators and the generative models. We introduce three operations in the \copyright Plug-in Market: addition, extraction and combination to facilitate proper credit attribution in the text-to-image procedure and enable the digital copyright protection. For the addition operation, we train a \copyright plug-in for a specific copyrighted concept and add it to the generative model and then we are able to generate new images with the copyrighted concept, which abstract existing solutions of portable LoRAs. We further introduce the extraction operation to enable content creators to claim copyrighted concept from infringing generative models and the combination operation to enable users to combine different \copyright plug-ins to generate images with multiple copyrighted concepts. We believe these basic operations give good incentives to each participant in the market, and enable enough flexibility to thrive the market. Technically, we innovate an inverse LoRA’’ approach to instantiate the extraction operation and propose a data-ignorant layer-wise distillation’’ approach to combine the multiple extractions or additions easily. To showcase the diverse capabilities of copyright plug-ins, we conducted experiments in two domains: style transfer and cartoon IP recreation. The results demonstrate that copyright plug-ins can effectively accomplish copyright extraction and combination, providing a valuable copyright protection solution for the era of generative AIs.
Membership Inference Attacks against Language Models via Neighbourhood Comparison
https://aclanthology.org/2023.findings-acl.719/
Deepfake Taylor Swift event:
- https://www.cbsnews.com/news/taylor-swift-artificial-intellignence-ai-4chan/
read on: - 13 Feb 2024
FMRisk
Mitigate
LLMEvaluate
Adversarial
In this session, our readings cover:
Required Readings:
TrustLLM: Trustworthiness in Large Language Models
- https://arxiv.org/abs/2401.05561
- Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
- Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into “The Good” (beneficial LLM applications), “The Bad” (offensive applications), and “The Ugly” (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs’ potential to both bolster and jeopardize cybersecurity
- https://arxiv.org/abs/2312.02003
More Readings:
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
- https://arxiv.org/abs/2212.14834
- Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs, a subfield of trustworthy ML, combining the perspectives of Natural Language Processing and Security. Prior work has shown that even safety-aligned LLMs (via instruction tuning and reinforcement learning through human feedback) can be susceptible to adversarial attacks, which exploit weaknesses and mislead AI systems, as evidenced by the prevalence of `jailbreak’ attacks on models like ChatGPT a
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
- https://arxiv.org/abs/2311.16119
- Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.
Even More:
ACL 2024 Tutorial: Vulnerabilities of Large Language Models to Adversarial Attacks
- https://llm-vulnerability.github.io/
Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration
NIST AI RISK MANAGEMENT FRAMEWORK
- https://www.nist.gov/itl/ai-risk-management-framework
- https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
- https://airc.nist.gov/AI_RMF_Knowledge_Base/Roadmap
- EU AI Act / GDPR
read on: - 08 Feb 2024
FMBasic
BasicLLM
In this session, our readings cover:
Required Readings:
Mistral 7B
- https://mistral.ai/news/announcing-mistral-7b/
- We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B – Instruct, that surpasses the Llama 2 13B – Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.
More Readings:
OLMo: Accelerating the Science of Language Models
- https://arxiv.org/abs/2402.00838
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, this technical report details the first release of OLMo, a state-of-the-art, truly Open Language Model and its framework to build and study the science of language modeling. Unlike most prior efforts that have only released model weights and inference code, we release OLMo and the whole framework, including training data and training and evaluation code. We hope this release will empower and strengthen the open research community and inspire a new wave of innovation.
Mixtral of Experts
- https://arxiv.org/abs/2401.04088
- We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
- https://arxiv.org/abs/2101.00027
- Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets – both existing and newly constructed – many of which derive from academic or professional sources. Our evaluation of the untuned performance of GPT-2 and GPT-3 on the Pile shows that these models struggle on many of its components, such as academic writing. Conversely, models trained on the Pile improve significantly over both Raw CC and CC-100 on all components of the Pile, while improving performance on downstream evaluations. Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction.
read on: - 06 Feb 2024
FMBasic
Alignment
In this session, our readings cover:
Required Readings:
Aligning Large Language Models with Human: A Survey
- https://arxiv.org/abs/2307.12966
- https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo
- https://huggingface.co/blog/stackllama
More readings
Github Awesome-RLHF
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- https://arxiv.org/abs/2301.13688
- We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at this https URL.
DPO Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- https://arxiv.org/abs/2305.18290
- https://huggingface.co/blog/dpo-trl
- While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss. The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods. Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.
Training language models to follow instructions with human feedback
- https://arxiv.org/abs/2203.02155)
- “further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences
- https://openreview.net/forum?id=GisHNaleWiA
-
“explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function” |
read on: - 01 Feb 2024
FMRisk
Mitigate
Adversarial
In this session, our readings cover:
Required Readings:
- https://arxiv.org/abs/2312.06674
- We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to as response classification. For the purpose of both prompt and response classification, we have meticulously gathered a dataset of high quality. Llama Guard, a Llama2-7b model that is instruction-tuned on our collected dataset, albeit low in volume, demonstrates strong performance on existing benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat, where its performance matches or exceeds that of currently available content moderation tools. Llama Guard functions as a language model, carrying out multi-class classification and generating binary decision scores. Furthermore, the instruction fine-tuning of Llama Guard allows for the customization of tasks and the adaptation of output formats. This feature enhances the model’s capabilities, such as enabling the adjustment of taxonomy categories to align with specific use cases, and facilitating zero-shot or few-shot prompting with diverse taxonomies at the input. We are making Llama Guard model weights available and we encourage researchers to further develop and adapt them to meet the evolving needs of the community for AI safety.
More Readings:
Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- [Submitted on 23 Feb 2023 (v1), last revised 5 May 2023 (this version, v2)]
- https://arxiv.org/abs/2302.12173
- Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
- Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks’ practical viability against both real-world systems, such as Bing’s GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application’s functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
- https://github.com/neelsjain/baseline-defenses
read on: - 30 Jan 2024
FMBasic
LLMEvaluate
In this session, our readings cover:
Required Readings:
Holistic Evaluation of Text-To-Image Models
- https://arxiv.org/abs/2311.04287
- The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at this https URL and the code at this https URL, which is integrated with the HELM codebase.
Holistic Evaluation of Language Models
- https://arxiv.org/abs/2211.09110
More Readings:
Challenges in evaluating AI systems
- https://www.anthropic.com/news/evaluating-ai-systems
Evaluating Large Language Models: A Comprehensive Survey
- https://arxiv.org/abs/2310.19736
- This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment evaluation and safety evaluation. In addition to the comprehensive review on the evaluation methodologies and benchmarks on these three aspects, we collate a compendium of evaluations pertaining to LLMs’ performance in specialized domains, and discuss the construction of comprehensive evaluation platforms that cover LLM evaluations on capabilities, alignment, safety, and applicability.
Evaluating Large Language Models Trained on Code
- https://arxiv.org/abs/2107.03374
chatbot-arena-leaderboard
- https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
Leveraging Large Language Models for NLG Evaluation: A Survey
- https://arxiv.org/abs/2401.07103
read on: - 25 Jan 2024
FMMulti
BasicLLM
In this session, our readings cover:
Readings:
ChatGPT is not all you need. A State of the Art Review of large Generative AI models
- Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchan
- https://arxiv.org/abs/2301.04655
- During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as being a general question and answering system or automatically creating artistic images that are revolutionizing several sectors. Consequently, the implications that these generative models have in the industry and society are enormous, as several job positions may be transformed. For example, Generative AI is capable of transforming effectively and creatively texts to images, like the DALLE-2 model; text to 3D images, like the Dreamfusion model; images to text, like the Flamingo model; texts to video, like the Phenaki model; texts to audio, like the AudioLM model; texts to other texts, like ChatGPT; texts to code, like the Codex model; texts to scientific texts, like the Galactica model or even create algorithms like AlphaTensor. This work consists on an attempt to describe in a concise way the main models are sectors that are affected by generative AI and to provide a taxonomy of the main generative models published recently.
A Survey of Large Language Models
- https://arxiv.org/abs/2303.18223
- Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.
On the Opportunities and Risks of Foundation Models
- https://arxiv.org/abs/2108.07258
- ” a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations).”
read on: - 23 Jan 2024
FMBasic
BasicLLM
Required Readings:
Emergent Abilities of Large Language Models
- URL
-
“an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.” |
Language Models are Few-Shot Learners
- URL
-
“GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.” |
A survey of Generative AI Applications
- https://arxiv.org/abs/2306.02781
- Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.
Generative AI: Perspectives from Stanford HAI
- https://hai.stanford.edu/generative-ai-perspectives-stanford-hai
read on: - 18 Jan 2024
FMBasic
BasicLLM
Readings:
Basics of ML and DL:
Basics of NLP
- URL
- Typical NLP tasks / Challenges / Pipeline
- f() on natural language
- Before Deep NLP (Pre 2012) • (BOW / LSI / Topic Modeling LDA )
- Word2Vec (2013-2016) • (GloVe/ FastText)
- Recurrent NN (2014-2016) • LSTM
- Seq2Seq
- Attention
- Self-Attention (2016 – now )
- Transformer (attention only Seq2Seq)
- BERT / RoBERTa/ XLNet/ GPT / …
- A good code walk through on transformer at URL
</div>
---
No. |
Date |
Title and Information |
PaperYear |
1 |
2022, Dec, 3 |
RLHF + InstructGPT |
2022-W6 |
2 |
2022, Dec, 1 |
Stable diffusion + DreamBooth + LoRA |
2022-W5 |
3 |
2022, Oct, 1 |
Emergent Abilities of LLM |
2022-W4 |
4 |
2022, Sep, 1 |
DiffDock + ESMfold |
2022-W2 |
5 |
2022, Jun, 3 |
Decision Transformers |
2022-W3 |
6 |
2022, May, 3 |
A Generalist Agent + offline RL + UniMask |
2022-W1 |
read on: - 03 Dec 2022
6Reinforcement
FMBasic
RL
AGI
language model
Human Alignment
Papers |
Paper URL |
Abstract |
Training language models to follow instructions with human feedback |
URL |
“further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.” |
Deep reinforcement learning from human preferences |
URL |
“explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function” |
read on: - 01 Dec 2022
FMBasic
FMMulti
Diffusion
Image synthesis
Efficiency
Stable diffusion
- URL
- “High-Resolution Image Synthesis with Latent Diffusion Models”
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
- URL
- “personalization” of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. .”
LoRA: Low-Rank Adaptation of Large Language Models
- URL
-
“propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.” |
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
- https://arxiv.org/abs/2208.01618
- Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
- Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks.
read on: - 01 Oct 2022
FMBasic
language model
Emergent Abilities of Large Language Models
- URL
- “an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”
Language Models are Few-Shot Learners
- URL
- “GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”
On the Opportunities and Risks of Foundation Models
- URL
-
” a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations).” |
The Power of Scale for Parameter-Efficient Prompt Tuning
- https://arxiv.org/abs/2104.08691
- Brian Lester, Rami Al-Rfou, Noah Constant
- In this work, we explore “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s “few-shot” learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method “closes the gap” and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed “prefix tuning” of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
read on: - 01 Sep 2022
9DiscreteApp
FMMulti
Protein
language model
Papers |
Paper URL |
Abstract |
Evolutionary-scale prediction of atomic level protein structure with a language model |
URL |
“show that direct inference of structure from primary sequence using a large language model enables an order of magnitude speed-up in high resolution structure prediction. Leveraging the insight that language models learn evolutionary patterns across millions of sequences, we train models up to 15B parameters,…” |
DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking |
URL |
“Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space.” |
read on: - 03 Jun 2022
6Reinforcement
RL
AGI
- Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
- https://arxiv.org/abs/2106.01345
- We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
- Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
- https://arxiv.org/abs/2206.13499
- Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.
read on: - 03 May 2022
6Reinforcement
RL
AGI
Papers |
Paper URL |
Abstract |
A Generalist Agent |
URL |
Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. |
Why should we prefer offline reinforcement learning over behavioral cloning? ICLR 2022 |
URL |
natural to ask: when can an offline RL method outperform BC with an equal amount of expert data, even when BC is a natural choice? |
Uni[MASK]: Unified Inference in Sequential Decision Problems |
URL |
show how sequential decision making tasks can be thought of in terms of corresponding input maskings, enabling the training of a single model to perform all tasks at once. applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. |
---
No. |
Date |
Title and Information |
PaperYear |
1 |
2021, Jan, 3 |
Introductory reads on DeepLearning |
2021-W0 |
2 |
2020, Aug, 5 |
Interpretable Deep Learning |
2020-W8 |
3 |
2020, Jul, 5 |
Trustworthy Deep Learning |
2020-W7 |
4 |
2020, Jun, 5 |
A few applications of Deep Learning |
2020-W7 |
5 |
2020, May, 5 |
Optimization and New Loss in Deep Learning |
2020-W7 |
6 |
2020, Apr, 5 |
Meta Deep Learning |
2020-W4 |
7 |
2020, Mar, 5 |
Deep Reinforcement Learning |
2020-W3 |
8 |
2020, Feb, 5 |
Latent and Generative Deep Learning |
2020-W2 |
9 |
2020, Jan, 5 |
Learning Relation from Data with Deep Learning |
2020-W0 |
10 |
2020, Jan, 5 |
GNN and Transformer |
2020-W1 |
read on: - 03 Jan 2021
0Basics
tutorial
Type |
Papers |
Paper URL |
Our Slides |
Dr Qi |
Survey of 10 DeepLearning (DL) trends different from classic machine learning |
|
OurSlide |
Youtube |
Generative DL Basics |
Youtube1 + Youtube2 |
NA |
Youtube |
Computation Graph for DL (pytorch vs. tensorflow |
Youtube URL + Youtube2 |
NA |
Youtube |
Auto Differentiation for DL |
Youtube1+ Youtube2 |
NA |
Youtube |
RL basics and DL-RL basics |
Youtube1 + Youtube2 |
NA |
Youtube |
Probabilistic programming and in DL Pyro |
Youtube1 + Youtube2 |
NA |
Youtube |
Basics of Software Testing for DL |
Youtube URL |
NA |
Course |
Bill_CNN_Ng_Lecture_Notes |
|
Bill’s Notes |
Course |
Bill_caltechMLnotes_ALL |
|
Bill’s Notes |
classic Paper |
The Lottery Ticket Hypothesis |
|
Morris’ Notes |
classic Paper |
NLP From Scratch |
|
Morris’ Notes |
classic Paper |
Statistical Modeling The Two Cultures |
|
Morris’ Notes |
classic Paper |
Attention_is_All_You_Need |
|
Eli’ Notes |
classic Paper |
YOLO |
|
Eli’ Notes |
classic Paper |
Neural Turing Machine |
|
Jake Survey |
classic Paper |
BERT (Bidirectional Encoder Representation for Transformers): Pretraining of Deep Bidirectional Transformers for Language Understanding |
|
Rishab Survey |
read on: - 05 Aug 2020
3Reliable
Interpretable
black-box
casual
attention
shapley
concept
Index |
Papers |
Our Slides |
0 |
A survey on Interpreting Deep Learning Models |
Eli Survey |
|
Interpretable Machine Learning: Definitions,Methods, Applications |
Arsh Survey |
1 |
Explaining Explanations: Axiomatic Feature Interactions for Deep Networks |
Arsh Survey |
2 |
Shapley Value review |
Arsh Survey |
|
L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data |
Bill Survey |
|
Consistent Individualized Feature Attribution for Tree Ensembles |
bill Survey |
|
Summary for A value for n-person games |
Pan Survey |
|
L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data |
Rishab Survey |
3 |
Hierarchical Interpretations of Neural Network Predictions |
Arsh Survey |
|
Hierarchical Interpretations of Neural Network Predictions |
Rishab Survey |
4 |
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs |
Arsh Survey |
|
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs |
Rishab Survey |
5 |
Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models |
Rishab Survey |
|
|
Sanchit Survey |
|
Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection |
Sanchit Survey |
6 |
This Looks Like That: Deep Learning for Interpretable Image Recognition |
Pan Survey |
7 |
AllenNLP Interpret |
Rishab Survey |
8 |
DISCOVERY OF NATURAL LANGUAGE CONCEPTS IN INDIVIDUAL UNITS OF CNNs |
Rishab Survey |
9 |
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations |
Rishab Survey |
10 |
Attention is not Explanation |
Sanchit Survey |
|
|
Pan Survey |
11 |
Axiomatic Attribution for Deep Networks |
Sanchit Survey |
12 |
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization |
Sanchit Survey |
13 |
Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifier |
Sanchit Survey |
14 |
“Why Should I Trust You?”Explaining the Predictions of Any Classifier |
Yu Survey |
15 |
INTERPRETATIONS ARE USEFUL: PENALIZING EXPLANATIONS TO ALIGN NEURAL NETWORKS WITH PRIOR KNOWLEDGE |
Pan Survey |
read on: - 05 Jul 2020
3Reliable
bias
data valuation
robustness
adversarial-examples
regularization
Index |
Papers |
Our Slides |
1 |
BIAS ALSO MATTERS: BIAS ATTRIBUTION FOR DEEP NEURAL NETWORK EXPLANATION |
Arsh Survey |
2 |
Data Shapley: Equitable Valuation of Data for Machine Learning |
Arsh Survey |
|
What is your data worth? Equitable Valuation of Data |
Sanchit Survey |
3 |
Neural Network Attributions: A Causal Perspective |
Zhe Survey |
4 |
Defending Against Neural Fake News |
Eli Survey |
5 |
Interpretation of Neural Networks is Fragile |
Eli Survey |
|
Interpretation of Neural Networks is Fragile |
Pan Survey |
6 |
Parsimonious Black-Box Adversarial Attacks Via Efficient Combinatorial Optimization |
Eli Survey |
7 |
Retrofitting Word Vectors to Semantic Lexicons |
Morris Survey |
8 |
On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models |
Morris Survey |
9 |
Towards Deep Learning Models Resistant to Adversarial Attacks |
Pan Survey |
10 |
Robust Attribution Regularization |
Pan Survey |
11 |
Sanity Checks for Saliency Maps |
Sanchit Survey |
12 |
Survey of data generation and evaluation in Interpreting DNN pipelines |
Sanchit Survey |
13 |
Think Architecture First: Benchmarking Deep Learning Interpretability in Time Series Predictions |
Sanchit Survey |
14 |
Universal Adversarial Triggers for Attacking and Analyzing NLP |
Sanchit Survey |
15 |
Apricot: Submodular selection for data summarization in Python |
Arsh Survey |
read on: - 05 Jun 2020
9DiscreteApp
Protein
Gene-network
Chromatin
language processing
Index |
Papers |
Our Slides |
1 |
Protein 3D Structure Computed from Evolutionary Sequence Variation |
Arsh Survey |
3 |
Regulatory network inference on developmental and evolutionary lineages |
Arsh Survey |
4 |
Deep learning in ultrasound image analysis |
Zhe Survey |
5 |
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (DeepBind) |
Jack Survey |
6 |
Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors |
Jack Survey |
7 |
BindSpace decodes transcription factor binding signals by large-scale sequence embedding |
Jack Survey |
8 |
FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning |
Jack Survey |
9 |
Query-Reduction Networks for Question Answering |
Bill Survey |
read on: - 05 May 2020
4Optimization
optimization
mutual-information
Index |
Papers |
Our Slides |
1 |
Review on Semi-Supervised Learning |
Zhe Survey |
2 |
Review on Generative Adversarial Networks |
Zhe Survey |
3 |
Information theory in deep learning |
Zhe Survey |
4 |
Lagrange Optimization |
Zhe Survey |
5 |
Deep Learning and Information Theory, and Graph Neural Network |
Derrick Survey |
6 |
Loss Functions for Deep Structured Models |
Jack Survey |
7 |
Group Sparsity and Optimization |
Zhe Survey |
read on: - 05 Apr 2020
7MetaDomain
Multi-Task
transfer-learning
Generalization
Index |
Papers |
Our Slides |
1 |
Invariant Risk Minimization |
Zhe Survey |
2 |
Causal Machine Learning |
Zhe Survey |
3 |
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms |
Zhe Survey |
3 |
Review on Optimization-Based Meta Learning |
Zhe Survey |
4 |
Domain adaptation and counterfactual prediction |
Zhe Survey |
5 |
Gaussian Processes |
Zhe Survey |
6 |
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data |
Zhe Survey |
7 |
Few-shot domain adaptation by causal mechanism transfer |
Zhe Survey |
read on: - 05 Mar 2020
6Reinforcement
RL
Generalization
Index |
Papers |
Our Slides |
1 |
Actor-Critic Methods for Control |
Jake Survey |
2 |
Generalization in Deep Reinforcement Learning |
Jake Survey |
3 |
Sample Efficient RL (Part 1) |
Jake Survey |
4 |
Sample Efficient RL (Part 2) |
Jake Survey |
5 |
Model-Free Value Methods in Deep RL |
Jake Survey |
6 |
Investigating Human Priors for Playing Video Games |
Arsh Survey |
read on: - 05 Jan 2020
2GraphsNN
Graph
Relational
Casual
Markov
Index |
Papers |
Our Slides |
1 |
A Flexible Generative Framework for Graph-based Semi-supervised Learning |
Arsh Survey |
2 |
Learning Discrete Structures for Graph Neural Networks |
Arsh Survey |
4 |
Graph Markov Neural Nets |
Arsh Survey |
|
Graph Markov Neural Networks |
Jack Survey |
5 |
GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations |
Arsh Survey |
6 |
Subgraph Neural Networks |
Arsh Survey |
7 |
Pointer Graph Networks |
Arsh Survey |
8 |
Modeling Relational Data with Graph Convolutional Networks |
Arsh Survey |
9 |
Graph Learning |
Zhe Survey |
8 |
Neural Relational Inference |
Zhe Survey |
read on: - 05 Jan 2020
8Scalable
2GraphsNN
GCN
graph attention
Index |
Papers |
Our Slides |
1 |
Graph Convolutions: More than You Wanted to Know |
Derrick Survey |
2 |
Spectral Graph Sparsification |
Derrick Survey |
3 |
Complexity Analysis of Graph Convolutional Networks and in Attention based GNN |
Derrick Survey |
4 |
PyTorch-BigGraph: A Large-Scale Graph Embedding System |
Derrick Survey |
5 |
Scalable GNN Updates: More About PyTorch Geometric (PyG) |
Derrick Survey |
6 |
Time and Space Complexity of Graph Convolutional Networks |
Derrick Survey |
7 |
Large Scale GNN and Transformer Models and for Genomics |
Jack Survey |
8 |
Long Range Attention and Visualizing BERT |
Jak Survey |
9 |
Benchmarking Graph Neural Networks |
Sanchit Survey |
---
No. |
Date |
Title and Information |
PaperYear |
1 |
2019, Apr, 5 |
GNN to Understand |
2019-W12 |
2 |
2019, Mar, 29 |
GNN for NLP QA |
2019-W11 |
3 |
2019, Mar, 25 |
Edge and Dynamic computing |
2019-W10 |
4 |
2019, Mar, 22 |
GNN and scalable |
2019-W9 |
5 |
2019, Mar, 15 |
GNN for Graph Generation |
2019-W8 |
6 |
2019, Mar, 6 |
GNN Robustness |
2019-W7 |
7 |
2019, Feb, 22 |
Geometric Deep Learning |
2019-W5 |
8 |
2019, Feb, 17 |
GNN for Program Analysis |
2019-W4 |
9 |
2019, Feb, 15 |
GNN for BioMed Applications |
2019-W3 |
10 |
2019, Feb, 1 |
GNN Basics II - Deep Learning Advances on Graphs |
2019-W2 |
11 |
2019, Jan, 25 |
GNN Basics I - Deep Learning Advances on Graphs |
2019-W1 |
read on: - 05 Apr 2019
2GraphsNN
3Reliable
Interpretable
black-box
casual
seq2seq
noise
knowledge-graph
attention
Presenter |
Papers |
Paper URL |
Our Slides |
Understand |
Faithful and Customizable Explanations of Black Box Models |
Pdf |
Derrick PDF |
Understand |
A causal framework for explaining the predictions of black-box sequence-to-sequence models, EMNLP17 |
Pdf |
GaoJi PDF + Bill Pdf |
Understand |
How Powerful are Graph Neural Networks? / Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning |
Pdf + Pdf |
GaoJi PDF |
Understand |
Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs + GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks |
Pdf + PDF |
GaoJi PDF |
Understand |
Attention is not Explanation, 2019 |
PDF |
|
Understand |
Understanding attention in graph neural networks, 2019 |
PDF |
|
read on: - 29 Mar 2019
2GraphsNN
9DiscreteApp
5Generative
generative
QA
NLP
knowledge-graph
GAN
graph
stylometric
Presenter |
Papers |
Paper URL |
Our Slides |
QA |
A Comparison of Current Graph Database Models |
Pdf + PDF2 |
Bill PDF |
QA |
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text |
Pdf |
Bill [PDF + GaoJi Pdf |
QA |
Generative Question Answering: Learning to Answer the Whole Question, Mike Lewis, Angela Fan |
Pdf |
Bill PDF + GaoJi Pdf |
QA |
Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings / Knowledge Graph Embedding via Dynamic Mapping Matrix |
PDF + Pdf |
Bill PDF + GaoJi Pdf |
Text |
Adversarial Text Generation via Feature-Mover’s Distance |
URL |
Faizan PDF |
Text |
Content preserving text generation with attribute controls |
URL |
Faizan PDF |
Text |
Multiple-Attribute Text Rewriting, ICLR, 2019, |
URL |
Faizan PDF |
Text |
Writeprints: a stylometric approach to identity level identification and similarity detection in cyberSpace |
URL |
Faizan PDF |
read on: - 25 Mar 2019
2GraphsNN
8Scalable
mobile
binary
dynamic
Presenter |
Papers |
Paper URL |
Our Slides |
Edge |
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications |
PDF |
|
Edge |
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks |
URL |
Ryan PDF |
Edge |
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices |
Pdf |
Eamon PDF |
Edge |
Loss-aware Binarization of Deep Networks, ICLR17 |
PDF |
Ryan PDF |
Edge |
Espresso: Efficient Forward Propagation for Binary Deep Neural Networks |
Pdf |
Eamon PDF |
Dynamic |
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution |
PDF |
Weilin PDF |
Dynamic |
Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems |
PDF |
|
Dynamic |
Cavs: An Efficient Runtime System for Dynamic Neural Networks |
Pdf |
|
read on: - 22 Mar 2019
2GraphsNN
8Scalable
graph
discrete
NLP
embedding
Hierarchical
Parallel
Distributed
dynamic
Presenter |
Papers |
Paper URL |
Our Slides |
Scalable |
FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling |
Pdf |
Ryan PDF + Arshdeep Pdf |
Scalable |
MILE: A Multi-Level Framework for Scalable Graph Embedding |
Pdf |
Ryan PDF |
Scalable |
LanczosNet: Multi-Scale Deep Graph Convolutional Networks |
Pdf |
Ryan PDF |
Scalable |
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis |
Pdf |
Derrick PDF |
Scalable |
Towards Federated learning at Scale: System Design |
URL |
Derrick PDF |
Scalable |
DNN Dataflow Choice Is Overrated |
PDF |
Derrick PDF |
Scalable |
Towards Efficient Large-Scale Graph Neural Network Computing |
Pdf |
Derrick PDF |
Scalable |
PyTorch Geometric |
URL |
|
Scalable |
PyTorch BigGraph |
URL |
|
Scalable |
Simplifying Graph Convolutional Networks |
Pdf |
|
Scalable |
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks |
Pdf |
|
read on: - 15 Mar 2019
2GraphsNN
5Generative
generative
GAN
graph
NLP
graphical-model
discrete
RNN
robustness
molecule
Variational
Autoencoder
RL
Beam
imputation
Matrix-Completion
random
Presenter |
Papers |
Paper URL |
Our Slides |
Generate |
Maximum-Likelihood Augmented Discrete Generative Adversarial Networks |
PDF |
Tkach PDF + GaoJi Pdf |
Generate |
Graphical Generative Adversarial Networks |
PDF |
Arshdeep PDF |
Generate |
GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, ICML2018 |
PDF |
Arshdeep PDF |
Generate |
Inference in probabilistic graphical models by Graph Neural Networks |
PDF |
Arshdeep PDF |
Generate |
Encoding robust representation for graph generation |
Pdf |
Arshdeep PDF |
Generate |
Junction Tree Variational Autoencoder for Molecular Graph Generation |
Pdf |
Tkach PDF + Arshdeep Pdf |
Generate |
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation NeurIPS18 |
|
Tkach PDF |
Generate |
Towards Variational Generation of Small Graphs |
Pdf |
Tkach PDF + Arshdeep Pdf |
Generate |
Convolutional Imputation of Matrix Networks |
Pdf |
Tkach PDF |
Generate |
Graph Convolutional Matrix Completion |
Pdf |
Tkach PDF |
Generate |
NetGAN: Generating Graphs via Random Walks ICML18 |
[ULR] |
Tkach PDF |
Beam |
Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement |
URL |
Tkach PDF |
read on: - 06 Mar 2019
2GraphsNN
3Reliable
graph
structured
Adversarial-Examples
binary
Presenter |
Papers |
Paper URL |
Our Slides |
Robust |
Adversarial Attacks on Graph Structured Data |
Pdf |
Faizan [PDF + GaoJi Pdf |
Robust |
KDD’18 Adversarial Attacks on Neural Networks for Graph Data |
Pdf |
Faizan PDF + GaoJi Pdf |
Robust |
Attacking Binarized Neural Networks |
Pdf |
Faizan PDF |
read on: - 22 Feb 2019
2GraphsNN
2Architecture
geometric
graph
matching
dynamic
manifold
invariant
Presenter |
Papers |
Paper URL |
Our Slides |
spherical |
Spherical CNNs |
Pdf |
Fuwen PDF + Arshdeep Pdf |
dynamic |
Dynamic graph cnn for learning on point clouds, 2018 |
Pdf |
Fuwen PDF |
basics |
Geometric Deep Learning (simple introduction video) |
URL |
|
matching |
All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks |
Pdf |
Fuwen PDF |
completion |
Geometric matrix completion with recurrent multi-graph neural networks |
Pdf |
Fuwen PDF |
Tutorial |
Geometric Deep Learning on Graphs and Manifolds |
URL |
Arsh PDF |
matching |
Similarity Learning with Higher-Order Proximity for Brain Network Analysis |
|
Arsh PDF |
pairwise |
Pixel to Graph with Associative Embedding |
PDF |
Fuwen PDF |
3D |
3D steerable cnns: Learning rotationally equivariant features in volumetric data |
URL |
Fuwen PDF |
read on: - 17 Feb 2019
2GraphsNN
9DiscreteApp
embedding
program
heterogeneous
Presenter |
Papers |
Paper URL |
Our Slides |
Program |
Neural network-based graph embedding for cross-platform binary code similarity detection |
Pdf + Pdf |
Faizan PDF + GaoJi Pdf |
Program |
Deep Program Reidentification: A Graph Neural Network Solution |
Pdf |
Weilin PDF |
Program |
Heterogeneous Graph Neural Networks for Malicious Account Detection |
Pdf |
Weilin Pdf |
Program |
Learning to represent programs with graphs |
Pdf |
|
read on: - 15 Feb 2019
2GraphsNN
9DiscreteApp
attention
relational
visualizing
geometric
DNA
protein
molecule
Presenter |
Papers |
Paper URL |
Our Slides |
Bio |
KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, 2018 |
Pdf |
Eli Pdf |
Bio |
Molecular geometry prediction using a deep generative graph neural network |
Pdf |
Eli Pdf |
Bio |
Visualizing convolutional neural network protein-ligand scoring |
PDF() |
Eli PDF |
Bio |
Deep generative models of genetic variation capture mutation effects |
PDF() |
Eli PDF |
Bio |
Attentive cross-modal paratope prediction |
Pdf |
Eli PDF |
read on: - 01 Feb 2019
2GraphsNN
Semi-supervised
relational
matching
graph
Presenter |
Papers |
Paper URL |
Our Slides |
Matching |
Deep Learning of Graph Matching, |
PDF+ PDF |
Jack Pdf |
Matching |
Graph Edit Distance Computation via Graph Neural Networks |
PDF |
Jack Pdf |
Basics |
Link Prediction Based on Graph Neural Networks |
Pdf |
Jack Pdf |
Basics |
Supervised Community Detection with Line Graph Neural Networks |
Pdf |
Jack Pdf |
Basics |
Graph mining: Laws, generators, and algorithms |
Pdf |
Arshdeep PDF |
pooling |
Hierarchical graph representation learning with differentiable pooling |
PDF |
Eamon PDF |
read on: - 25 Jan 2019
2GraphsNN
0Basics
8Scalable
invariant
scalable
embedding
Presenter |
Papers |
Paper URL |
Our Notes |
Basics |
GraphSAGE: Large-scale Graph Representation Learning by Jure Leskovec Stanford University |
URL + PDF |
|
Basics |
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering by Xavier Bresson |
URL + PDF |
Ryan Pdf |
Basics |
Gated Graph Sequence Neural Networks by Microsoft Research |
URL + PDF |
Faizan Pdf |
Basics |
DeepWalk - Turning Graphs into Features via Network Embeddings |
URL + PDF |
|
Basics |
Spectral Networks and Locally Connected Networks on Graphs |
Pdf |
GaoJi slides + Bill Pdf |
Basics |
A Comprehensive Survey on Graph Neural Networks/ Graph Neural Networks: A Review of Methods and Applications |
Pdf |
Jack Pdf |
GCN |
Semi-Supervised Classification with Graph Convolutional Networks |
Pdf |
Jack Pdf |
---
No. |
Date |
Title and Information |
PaperYear |
1 |
2019, Dec, 12 |
deep2reproduce 2019 Fall - 1Analysis papers |
2019-fall Students deep2reproduce |
2 |
2019, Dec, 11 |
deep2reproduce 2019 Fall - 2Architecture papers |
2019-fall Students deep2reproduce |
3 |
2019, Dec, 10 |
deep2reproduce 2019 Fall - 3Reliable papers |
2019-fall Students deep2reproduce |
4 |
2019, Dec, 9 |
deep2reproduce 2019 Fall - 5Generative papers |
2019-fall Students deep2reproduce |
5 |
2019, Dec, 8 |
deep2reproduce 2019 Fall - 6Reinforcement papers |
2019-fall Students deep2reproduce |
6 |
2019, Dec, 7 |
deep2reproduce 2019 Fall - 7MetaDomain papers |
2019-fall Students deep2reproduce |
7 |
2019, Dec, 6 |
deep2reproduce 2019 Fall - 8Scalable papers |
2019-fall Students deep2reproduce |
8 |
2019, Nov, 3 |
A general survey |
2019-fall Course |
read on: - 03 Nov 2019
0Basics
tutorial
---
No. |
Date |
Title and Information |
PaperYear |
1 |
2018, Dec, 29 |
Generate18- Deep Generative Models for discrete |
2018-team |
2 |
2018, Dec, 21 |
Generate18- Deep Generative Models for Graphs |
2018-team |
3 |
2018, Dec, 20 |
Application18- DNN for QA and MedQA |
2018-team |
4 |
2018, Dec, 2 |
Reliable18- Adversarial Attacks and DNN |
2018-team |
5 |
2018, Nov, 20 |
Reliable18- Adversarial Attacks and DNN |
2018-team |
6 |
2018, Oct, 25 |
Structure18- DNNs Varying Structures |
2018-team |
7 |
2018, Oct, 16 |
Application18- Graph DNN in a Few Bio Tasks |
2018-team |
8 |
2018, Oct, 13 |
Application18- DNNs in a Few Bio CRISPR Tasks |
2018-team |
9 |
2018, Oct, 12 |
Reliable18- Understand DNNs |
2018-team |
10 |
2018, Oct, 11 |
Structures18- DNN for Relations |
2018-team |
11 |
2018, Aug, 29 |
Survey18- My Tutorial Talk at ACM BCB18 - Interpretable Deep Learning for Genomics |
2018-me |
12 |
2018, Aug, 27 |
Application18- A few DNN for Question Answering |
2018-team |
13 |
2018, Aug, 23 |
Generative18 -A few more deep discrete Generative Models |
2018-team |
14 |
2018, Aug, 13 |
Application18- DNNs in a Few BioMedical Tasks |
2018-team |
15 |
2018, Aug, 3 |
Reliable18- Testing and Verifying DNNs |
2018-team |
16 |
2018, May, 20 |
Reliable18- Adversarial Attacks and DNN and More |
2018-team |
17 |
2018, May, 12 |
Reliable18- Adversarial Attacks and DNN |
2018-team |
18 |
2018, May, 11 |
Structures18- DNN for Multiple Label Classification |
2018-team |
19 |
2018, May, 3 |
Structures18- More Attentions |
2018-team |
20 |
2018, Apr, 20 |
Generative18 -Generative Adversarial Network (classified) |
2018-team |
21 |
2018, Feb, 20 |
Survey18- My Survey Talk at UVA HMI Seminar - A quick and rough overview of DNN |
2018-me |
22 |
2018, Jan, 10 |
Application18- Property of DeepNN Models and Discrete tasks |
2018-team |
read on: - 29 Dec 2018
5Generative
2GraphsNN
generative
GAN
discrete
Autoencoder
Variational
Presenter |
Papers |
Paper URL |
Our Slides |
Tkach |
Boundary-Seeking Generative Adversarial Networks |
PDF |
PDF |
Tkach |
Maximum-Likelihood Augmented Discrete Generative Adversarial Networks |
PDF |
PDF |
Tkach |
Generating Sentences from a Continuous Space |
PDF |
PDF |
read on: - 21 Dec 2018
5Generative
2GraphsNN
generative
GAN
discrete
Autoencoder
Variational
molecule
graph
DNA
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
Constrained Graph Variational Autoencoders for Molecule Design |
PDF |
PDF |
Arshdeep |
Learning Deep Generative Models of Graphs |
PDF |
PDF |
Arshdeep |
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation |
PDF |
PDF |
Jack |
Generating and designing DNA with deep generative models |
PDF |
PDF |
read on: - 20 Dec 2018
2Architecture
9DiscreteApp
seq2seq
recommendation
QA
graph
relational
EHR
Presenter |
Papers |
Paper URL |
Our Slides |
Bill |
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning |
PDF |
PDF |
Chao |
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (I) |
PDF |
PDF |
Chao |
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (II) |
PDF |
PDF |
Derrick |
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis (III) |
PDF |
PDF |
Chao |
Reading Wikipedia to Answer Open-Domain Questions |
PDF |
PDF |
Jennifer |
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text |
PDF |
PDF |
read on: - 02 Dec 2018
3Reliable
Adversarial-Examples
visualizing
Interpretable
EHR
NLP
Presenter |
Papers |
Paper URL |
Our Slides |
Jennifer |
Adversarial Attacks Against Medical Deep Learning Systems |
PDF |
PDF |
Jennifer |
Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning |
PDF |
PDF |
Jennifer |
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers |
PDF |
PDF |
Jennifer |
CleverHans |
PDF |
PDF |
Ji |
Ji-f18-New papers about adversarial attack |
|
PDF |
read on: - 20 Nov 2018
3Reliable
Adversarial-Examples
software-testing
Interpretable
distillation
Presenter |
Papers |
Paper URL |
Our Slides |
Bill |
Adversarial Examples that Fool both Computer Vision and Time-Limited Humans |
PDF |
PDF |
Bill |
Adversarial Attacks Against Medical Deep Learning Systems |
PDF |
PDF |
Bill |
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing |
PDF |
PDF |
Bill |
Distilling the Knowledge in a Neural Network |
PDF |
PDF |
Bill |
Defensive Distillation is Not Robust to Adversarial Examples |
PDF |
PDF |
Bill |
Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow |
PDF |
PDF |
read on: - 25 Oct 2018
2Architecture
8Scalable
7MetaDomain
Architecture-Search
Hyperparameter
dynamic
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
Learning Transferable Architectures for Scalable Image Recognition |
PDF |
PDF |
Arshdeep |
FractalNet: Ultra-Deep Neural Networks without Residuals |
PDF |
PDF |
read on: - 16 Oct 2018
2GraphsNN
9DiscreteApp
graph
protein
molecule
Presenter |
Papers |
Paper URL |
Our Slides |
Eric |
Modeling polypharmacy side effects with graph convolutional networks |
PDF |
PDF |
Eric |
Protein Interface Prediction using Graph Convolutional Networks |
PDF |
PDF |
Eric |
Structure biology meets data science: does anything change |
URL |
PDF |
Eric |
DeepSite: protein-binding site predictor using 3D-convolutional neural networks |
URL |
PDF |
read on: - 13 Oct 2018
9DiscreteApp
brain
CRISPR
DNA
Genomics
generative
protein
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
deepCRISPR: optimized CRISPR guide RNA design by deep learning , Genome Biology 2018 |
PDF |
PDF |
Arshdeep |
The CRISPR tool kit for genome editing and beyond, Mazhar Adli |
PDF |
PDF |
Eric |
Intro of Genetic Engineering |
PDF |
PDF |
Eric |
Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs |
PDF |
PDF |
Brandon |
Generative Modeling for Protein Structure |
URL |
PDF |
read on: - 12 Oct 2018
3Reliable
visualizing
interpretable
Attribution
GAN
understanding
Presenter |
Papers |
Paper URL |
Our Slides |
Jack |
A Unified Approach to Interpreting Model Predictions |
PDF |
PDF |
Jack |
“Why Should I Trust You?”: Explaining the Predictions of Any Classifier |
PDF |
PDF |
Jack |
Visual Feature Attribution using Wasserstein GANs |
PDF |
PDF |
Jack |
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks |
PDF |
PDF |
GaoJi |
Recent Interpretable machine learning papers |
PDF |
PDF |
Jennifer |
The Building Blocks of Interpretability |
PDF |
PDF |
read on: - 11 Oct 2018
2Architecture
2GraphsNN
relational
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
Relational inductive biases, deep learning, and graph networks |
PDF |
PDF |
Arshdeep |
Discriminative Embeddings of Latent Variable Models for Structured Data |
PDF |
PDF |
Jack |
Deep Graph Infomax |
PDF |
PDF |
read on: - 29 Aug 2018
9DiscreteApp
tutorial
Interpretable
Presenter |
Papers |
Paper URL |
Our Slides |
Dr. Qi |
Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation |
|
PDF |
Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin, NIPS2017 / Ritambhara Singh, Jack Lanchantin, Arshdeep Sekhon, Yanjun Qi
The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together? Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; we call AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map.
Code and data are shared atwww.deepchrome.org
read on: - 27 Aug 2018
2Architecture
8Scalable
9DiscreteApp
trees
metric-learning
embedding
QA
Presenter |
Papers |
Paper URL |
Our Slides |
Derrick |
GloVe: Global Vectors for Word Representation |
PDF |
PDF |
Derrick |
PARL.AI: A unified platform for sharing, training and evaluating dialog models across many tasks. |
URL |
PDF |
Derrick |
scalable nearest neighbor algorithms for high dimensional data (PAMI14) |
PDF |
PDF |
Derrick |
StarSpace: Embed All The Things! |
PDF |
PDF |
Derrick |
Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading, Martin Raison, Pierre-Emmanuel Mazaré, Rajarshi Das, Antoine Bordes |
PDF |
PDF |
read on: - 23 Aug 2018
5Generative
9DiscreteApp
generative
generalization
GAN
discrete
Amortized
Autoencoder
Variational
program
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Chris J. Maddison, Andriy Mnih, Yee Whye Teh |
PDF |
PDF |
GaoJi |
Summary Of Several Autoencoder models |
PDF |
PDF |
GaoJi |
Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models, Jesse Engel, Matthew Hoffman, Adam Roberts |
PDF |
PDF |
GaoJi |
Summary of A Few Recent Papers about Discrete Generative models, SeqGAN, MaskGAN, BEGAN, BoundaryGAN |
PDF |
PDF |
Arshdeep |
Semi-Amortized Variational Autoencoders, Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush |
PDF |
PDF |
Arshdeep |
Synthesizing Programs for Images using Reinforced Adversarial Learning, Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S.M. Ali Eslami, Oriol Vinyals |
PDF |
PDF |
read on: - 13 Aug 2018
3Reliable
6Reinforcement
9DiscreteApp
brain
RNA
DNA
Genomics
generative
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. |
PDF |
PDF |
Arshdeep |
Solving the RNA design problem with reinforcement learning, PLOSCB |
PDF |
PDF |
Arshdeep |
Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk |
PDF |
PDF |
Arshdeep |
Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio |
PDF |
PDF |
Brandon |
Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics |
PDF |
PDF |
Arshdeep |
Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions |
PDF |
PDF |
read on: - 03 Aug 2018
3Reliable
6Reinforcement
RL
Fuzzing
Adversarial-Examples
verification
software-testing
black-box
white-box
Presenter |
Papers |
Paper URL |
Our Slides |
GaoJi |
Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh |
PDF |
PDF |
GaoJi |
Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer |
PDF |
PDF |
GaoJi |
DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray |
PDF |
PDF |
GaoJi |
A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors |
PDF |
PDF |
GaoJi |
A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel) |
PDF |
PDF |
Testing |
DeepXplore: Automated Whitebox Testing of Deep Learning Systems |
PDF |
|
read on: - 20 May 2018
3Reliable
9DiscreteApp
seq2seq
Adversarial-Examples
Certified-Defense
Presenter |
Papers |
Paper URL |
Our Slides |
Bill |
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples |
PDF |
PDF |
Bill |
Adversarial Examples for Evaluating Reading Comprehension Systems, Robin Jia, Percy Liang |
PDF |
PDF |
Bill |
Certified Defenses against Adversarial Examples, Aditi Raghunathan, Jacob Steinhardt, Percy Liang |
PDF |
PDF |
Bill |
Provably Minimally-Distorted Adversarial Examples, Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill |
PDF |
PDF |
read on: - 12 May 2018
3Reliable
9DiscreteApp
Adversarial-Examples
generative
Interpretable
Presenter |
Papers |
Paper URL |
Our Slides |
Bill |
Intriguing Properties of Adversarial Examples, Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le |
PDF |
PDF |
Bill |
Adversarial Spheres |
PDF |
PDF |
Bill |
Adversarial Transformation Networks: Learning to Generate Adversarial Examples, Shumeet Baluja, Ian Fischer |
PDF |
PDF |
Bill |
Thermometer encoding: one hot way to resist adversarial examples |
PDF |
PDF |
|
Adversarial Logit Pairing , Harini Kannan, Alexey Kurakin, Ian Goodfellow |
PDF |
|
read on: - 11 May 2018
2Architecture
2GraphsNN
multi-label
structured
Adversarial-loss
attention
RNN
Presenter |
Papers |
Paper URL |
Our Slides |
Chao |
Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification |
PDF |
PDF |
Jack |
FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning |
PDF |
PDF |
BasicMLC |
Multi-Label Classification: An Overview |
PDF |
|
SPEN |
Structured Prediction Energy Networks |
PDF |
|
InfNet |
Learning Approximate Inference Networks for Structured Prediction |
PDF |
|
SPENMLC |
Deep Value Networks |
PDF |
|
Adversarial |
Semantic Segmentation using Adversarial Networks |
PDF |
|
EmbedMLC |
StarSpace: Embed All The Things! |
PDF |
|
deepMLC |
CNN-RNN: A Unified Framework for Multi-label Image Classification/ CVPR 2016 |
PDF |
|
deepMLC |
Order-Free RNN with Visual Attention for Multi-Label Classification / AAAI 2018 |
PDF |
|
read on: - 03 May 2018
2Architecture
attention
relational
Variational
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention |
PDF |
PDF |
Arshdeep |
Latent Alignment and Variational Attention |
PDF |
PDF |
Arshdeep |
Modularity Matters: Learning Invariant Relational Reasoning Tasks, Jason Jo, Vikas Verma, Yoshua Bengio |
PDF |
PDF |
read on: - 20 Apr 2018
5Generative
DNA
generative
GAN
generalization
Presenter |
Papers |
Paper URL |
Our Slides |
BrandonLiu |
Summary of Recent Generative Adversarial Networks (Classified) |
|
PDF |
Jack |
Generating and designing DNA with deep generative models, Nathan Killoran, Leo J. Lee, Andrew Delong, David Duvenaud, Brendan J. Frey |
PDF |
PDF |
GaoJi |
More about basics of GAN |
|
PDF |
|
McGan: Mean and Covariance Feature Matching GAN, PMLR 70:2527-2535 |
PDF |
|
|
Wasserstein GAN, ICML17 |
PDF |
|
|
Geometrical Insights for Implicit Generative Modeling, L Bottou, M Arjovsky, D Lopez-Paz, M Oquab |
PDF |
|
read on: - 20 Feb 2018
0Basics
Presenter |
Papers |
Paper URL |
Our Slides |
Dr. Qi |
A quick and rough survey of Deep-Neural-Networks |
|
PDF |
read on: - 10 Jan 2018
3Reliable
embedding
generative
NLP
generalization
NLP
Presenter |
Papers |
Paper URL |
Our Slides |
Bill |
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation |
PDF |
PDF |
Bill |
Measuring the tendency of CNNs to Learn Surface Statistical Regularities Jason Jo, Yoshua Bengio |
PDF |
PDF |
Bill |
Generating Sentences by Editing Prototypes, Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang |
PDF |
PDF |
Bill |
On the importance of single directions for generalization, Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick |
PDF |
PDF |
---
No. |
Date |
Title and Information |
PaperYear |
1 |
2017, Jul, 22 |
Reliable17-Testing and Machine Learning Basics |
2017-team |
2 |
2017, Jun, 22 |
Structures17 - Adaptive Deep Networks II |
2017-team |
3 |
2017, Jun, 2 |
Structures17 -Adaptive Deep Networks I |
2017-team |
4 |
2017, May, 22 |
Generative17- Generative Deep Networks |
2017-team |
5 |
2017, Apr, 22 |
Optimization17- Optimization in DNN |
2017-team |
6 |
2017, Feb, 22 |
Reliable17-Secure Machine Learning |
2017-team |
7 |
2017, Jan, 21 |
Basic BioMed Application Reads we finished before 2017 |
2017-team |
8 |
2017, Jan, 20 |
Basic16- DNN to be Scalable |
2017-team |
9 |
2017, Jan, 19 |
Basic16- Basic Deep NN and Robustness |
2017-team |
10 |
2017, Jan, 18 |
Basic16- Basic Deep NN with Memory |
2017-team |
11 |
2017, Jan, 12 |
Basic16- Basic DNN Embedding we read for Ranking/QA |
2017-team |
12 |
2017, Jan, 12 |
Basic16- Basic DNN Reads we finished for NLP/Text |
2017-team |
read on: - 22 Jul 2017
3Reliable
software-testing
white-box
black-box
robustness
Metamorphic
Influence Functions
Presenter |
Papers |
Paper URL |
Our Slides |
GaoJi |
A few useful things to know about machine learning |
PDF |
PDF |
GaoJi |
A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions |
PDF |
PDF |
GaoJi |
Automated White-box Testing of Deep Learning Systems |
PDF |
PDF |
GaoJi |
Testing and Validating Machine Learning Classifiers by Metamorphic Testing |
PDF |
PDF |
GaoJi |
Software testing: a research travelogue (2000–2014) |
PDF |
PDF |
read on: - 22 Jun 2017
2Architecture
8Scalable
low-rank
binary
dynamic
learn2learn
optimization
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction |
PDF |
PDF |
Arshdeep |
Decoupled Neural Interfaces Using Synthetic Gradients |
PDF |
PDF |
Arshdeep |
Diet Networks: Thin Parameters for Fat Genomics |
PDF |
PDF |
Arshdeep |
Metric Learning with Adaptive Density Discrimination |
PDF |
PDF |
read on: - 02 Jun 2017
2Architecture
8Scalable
low-rank
binary
dynamic
learn2learn
optimization
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
HyperNetworks, David Ha, Andrew Dai, Quoc V. Le ICLR 2017 |
PDF |
PDF |
Arshdeep |
Learning feed-forward one-shot learners |
PDF |
PDF |
Arshdeep |
Learning to Learn by gradient descent by gradient descent |
PDF |
PDF |
Arshdeep |
Dynamic Filter Networks https://arxiv.org/abs/1605.09673 |
PDF |
PDF |
read on: - 22 May 2017
5Generative
generative
GAN
Presenter |
Papers |
Paper URL |
Our Slides |
Tobin |
Energy-Based Generative Adversarial Network |
PDF |
PDF |
Jack |
Three Deep Generative Models |
PDF |
PDF |
read on: - 22 Apr 2017
4Optimization
optimization
scalable
EM
propagation
mimic
Presenter |
Papers |
Paper URL |
Our Slides |
Muthu |
Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal |
PDF |
PDF |
Muthu |
Fast Training of Recurrent Networks Based on EM Algorithm (1998) |
PDF |
PDF |
Muthu |
FitNets: Hints for Thin Deep Nets, ICLR15 |
PDF |
PDF |
Muthu |
Two NIPS 2015 Deep Learning Optimization Papers |
PDF |
PDF |
Muthu |
Difference Target Propagation (2015) |
PDF |
PDF |
read on: - 22 Feb 2017
3Reliable
secure
Privacy
Cryptography
Presenter |
Papers |
Paper URL |
Our Slides |
Tobin |
Summary of A few Papers on: Machine Learning and Cryptography, (e.g., learning to Protect Communications with Adversarial Neural Cryptography) |
PDF |
PDF |
Tobin |
Privacy Aware Learning (NIPS12) |
PDF |
PDF |
Tobin |
Can Machine Learning be Secure?(2006) |
PDF |
PDF |
read on: - 21 Jan 2017
9DiscreteApp
DNA
RNA
protein
invariant
binary
random
relational
Presenter |
Papers |
Paper URL |
Our Slides |
DeepBind |
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning |
PDF |
|
DeepSEA |
Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk |
PDF |
|
DeepSEA |
Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction, ICML 2014 |
|
|
BioBasics |
A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics13 |
|
|
BioBasics |
Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011 |
|
|
BioBasics |
Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004 |
|
|
BioBasics |
NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels |
|
|
MedSignal |
Segmenting Time Series: A Survey and Novel Approach, |
PDF |
|
read on: - 20 Jan 2017
0Basics
2Architecture
8Scalable
scalable
random
sparsity
binary
hash
compression
low-rank
distributed
dimension reduction
pruning
sketch
Parallel
Presenter |
Papers |
Paper URL |
Our Slides |
scalable |
Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning 2010 [^1] |
PDF |
|
data scalable |
Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus 2012 [^2] |
PDF 2014 + PDF |
|
Binary |
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 |
|
|
Model |
Binary embeddings with structured hashed projections |
PDF |
PDF |
Model |
Deep Compression: Compressing Deep Neural Networks (ICLR 2016) |
PDF |
PDF |
read on: - 19 Jan 2017
0Basics
3Reliable
Adversarial-Examples
robustness
visualizing
Interpretable
Certified-Defense
Presenter |
Papers |
Paper URL |
Our Slides |
AE |
Intriguing properties of neural networks / |
PDF |
|
AE |
Explaining and Harnessing Adversarial Examples |
PDF |
|
AE |
Towards Deep Learning Models Resistant to Adversarial Attacks |
PDF |
|
AE |
DeepFool: a simple and accurate method to fool deep neural networks |
PDF |
|
AE |
Towards Evaluating the Robustness of Neural Networks by Carlini and Wagner |
PDF |
PDF |
Data |
Basic Survey of ImageNet - LSVRC competition |
URL |
PDF |
Understand |
Understanding Black-box Predictions via Influence Functions |
PDF |
|
Understand |
Deep inside convolutional networks: Visualising image classification models and saliency maps |
PDF |
|
Understand |
BeenKim, Interpretable Machine Learning, ICML17 Tutorial [^1] |
PDF |
|
provable |
Provable defenses against adversarial examples via the convex outer adversarial polytope, Eric Wong, J. Zico Kolter, |
URL |
|
read on: - 18 Jan 2017
0Basics
2Architecture
7MetaDomain
memory
NTM
seq2seq
pointer
set
attention
meta-learning
Few-Shot
matching net
metric-learning
Presenter |
Papers |
Paper URL |
Our Slides |
seq2seq |
Sequence to Sequence Learning with Neural Networks |
PDF |
|
Set |
Pointer Networks |
PDF |
|
Set |
Order Matters: Sequence to Sequence for Sets |
PDF |
|
Point Attention |
Multiple Object Recognition with Visual Attention |
PDF |
|
Memory |
End-To-End Memory Networks |
PDF |
Jack Survey |
Memory |
Neural Turing Machines |
PDF |
|
Memory |
Hybrid computing using a neural network with dynamic external memory |
PDF |
|
Muthu |
Matching Networks for One Shot Learning (NIPS16) |
PDF |
PDF |
Jack |
Meta-Learning with Memory-Augmented Neural Networks (ICML16) |
PDF |
PDF |
Metric |
ICML07 Best Paper - Information-Theoretic Metric Learning |
PDF |
|
read on: - 12 Jan 2017
0Basics
2Architecture
9DiscreteApp
embedding
recommendation
QA
graph
relational
Presenter |
Papers |
Paper URL |
Our Slides |
QA |
Learning to rank with (a lot of) word features |
PDF |
|
Relation |
A semantic matching energy function for learning with multi-relational data |
PDF |
|
Relation |
Translating embeddings for modeling multi-relational data |
PDF |
|
QA |
Reading wikipedia to answer open-domain questions |
PDF |
|
QA |
Question answering with subgraph embeddings |
PDF |
|
read on: - 12 Jan 2017
0Basics
2Architecture
9DiscreteApp
embedding
text
BERT
seq2seq
attention
NLP
curriculum
BackProp
relational
Presenter |
Papers |
Paper URL |
Our Slides |
NLP |
A Neural Probabilistic Language Model |
PDF |
|
Text |
Bag of Tricks for Efficient Text Classification |
PDF |
|
Text |
Character-level Convolutional Networks for Text Classification |
PDF |
|
NLP |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
PDF |
|
seq2seq |
Neural Machine Translation by Jointly Learning to Align and Translate |
PDF |
|
NLP |
Natural Language Processing (almost) from Scratch |
PDF |
|
Train |
Curriculum learning |
PDF |
|
Muthu |
NeuroIPS Embedding Papers survey 2012 to 2015 |
NIPS |
PDF |
Basics |
Efficient BackProp |
PDF |
|
---
No. |
Date |
Title and Information |
PaperYear |
1 |
2017, Nov, 30 |
RL IV - RL with varying structures |
2017-W15 |
2 |
2017, Nov, 28 |
RL III - Basic tutorial RLSS17 (2) |
2017-W14 |
3 |
2017, Nov, 21 |
RL II - Basic tutorial RLSS17 |
2017-W14 |
4 |
2017, Nov, 16 |
Generative III - GAN training |
2017-W13 |
5 |
2017, Nov, 14 |
Generative II - Deep Generative Models |
2017-W13 |
6 |
2017, Nov, 9 |
Optimization IV - change DNN architecture for Optimization |
2017-W12 |
7 |
2017, Nov, 7 |
Optimization III - Optimization for DNN |
2017-W12 |
8 |
2017, Nov, 2 |
Optimization II - DNN for Optimization |
2017-W11 |
9 |
2017, Oct, 31 |
Optimization I - Understanding DNN Optimization |
2017-W11 |
10 |
2017, Oct, 26 |
Reliable Applications VI - Robustness2 |
2017-W10 |
11 |
2017, Oct, 23 |
Reliable Applications IV - Robustness |
2017-W9 |
12 |
2017, Oct, 17 |
Reliable Applications III - Interesting Tasks |
2017-W9 |
13 |
2017, Oct, 12 |
Reliable Applications II - Data privacy |
2017-W8 |
14 |
2017, Oct, 11 |
Reliable Applications V - Understanding2 |
2017-W10 |
15 |
2017, Oct, 10 |
Reliable Applications I - Understanding |
2017-W8 |
16 |
2017, Oct, 5 |
Structure VI - DNN with Adaptive Structures |
2017-W7 |
17 |
2017, Oct, 3 |
Structure V - DNN with Attention 3 |
2017-W7 |
18 |
2017, Sep, 28 |
Structure IV - DNN with Attention 2 |
2017-W6 |
19 |
2017, Sep, 26 |
Structure III - DNN with Attention |
2017-W6 |
20 |
2017, Sep, 21 |
Structure II - DNN with Varying Structures |
2017-W5 |
21 |
2017, Sep, 19 |
Structure I - Varying DNN structures |
2017-W5 |
22 |
2017, Sep, 14 |
Theoretical17 VI - More about Behaviors of DNN |
2017-W4 |
23 |
2017, Sep, 12 |
Theoretical17 V - More about Behaviors of DNN |
2017-W4 |
24 |
2017, Sep, 7 |
Theoretical17 IV - Investigating Behaviors of DNN |
2017-W3 |
25 |
2017, Sep, 5 |
Theoretical17 III - Investigating Behaviors of DNN |
2017-W3 |
26 |
2017, Aug, 31 |
Generative I - GAN tutorial by Ian Goodfellow |
2017-W2 |
27 |
2017, Aug, 29 |
Reinforcement I - Pineau - RL Basic Concepts |
2017-W2 |
28 |
2017, Aug, 24 |
Theoretical17 II - Ganguli - Theoretical Neuroscience and Deep Learning DLSS16 |
2017-W1 |
29 |
2017, Aug, 22 |
Basic17 -Andrew Ng - Nuts and Bolts of Applying Deep Learning |
2017-W1 |
read on: - 30 Nov 2017
6Reinforcement
Auxiliary
Sampling
Value-Networks
structured
Imitation-Learning
Hierarchical
Presenter |
Papers |
Paper URL |
Our Slides |
Ceyer |
Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 |
PDF |
PDF |
Beilun |
Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy |
PDF |
PDF |
Ji |
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 |
PDF |
PDF |
Xueying |
End-to-End Differentiable Adversarial Imitation Learning, ICML17 |
PDF |
PDF |
|
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17 |
PDF |
|
|
FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 |
PDF |
|
read on: - 28 Nov 2017
6Reinforcement
alphaGO
Planning
Temporal-Difference
Presenter |
Papers |
Paper URL |
Our Slides |
Anant |
The Predictron: End-to-End Learning and Planning, ICLR17 |
PDF |
PDF |
ChaoJiang |
Szepesvari - Theory of RL |
RLSS.pdf + Video |
PDF |
GaoJi |
Mastering the game of Go without human knowledge / Nature 2017 |
PDF |
PDF |
|
Thomas - Safe Reinforcement Learning |
RLSS17.pdf + video |
|
|
Sutton - Temporal-Difference Learning |
RLSS17.pdf + Video |
|
read on: - 16 Nov 2017
5Generative
generative
generalization
Denoising
Model-Criticism
Presenter |
Papers |
Paper URL |
Our Slides |
Arshdeep |
Generalization and Equilibrium in Generative Adversarial Nets (ICML17) |
PDF + video |
PDF |
Arshdeep |
Mode Regularized Generative Adversarial Networks (ICLR17) |
PDF |
PDF |
Bargav |
Improving Generative Adversarial Networks with Denoising Feature Matching, ICLR17 |
PDF |
PDF |
Anant |
Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy, ICLR17 |
PDF + code |
PDF |
read on: - 14 Nov 2017
5Generative
generative
attention
Composition
graphical-model
Autoregressive
structured
Presenter |
Papers |
Paper URL |
Our Slides |
ChaoJiang |
Courville - Generative Models II |
DLSS17Slide + video |
PDF |
GaoJi |
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, NIPS16 |
PDF + talk |
PDF |
Arshdeep |
Composing graphical models with neural networks for structured representations and fast inference, NIPS16 |
PDF |
PDF |
|
Johnson - Graphical Models and Deep Learning |
DLSSSlide + video |
|
|
Parallel Multiscale Autoregressive Density Estimation, ICML17 |
PDF |
|
Beilun |
Conditional Image Generation with Pixel CNN Decoders, NIPS16 |
PDF |
PDF |
Shijia |
Marrying Graphical Models & Deep Learning |
DLSS17 + Video |
PDF |
read on: - 09 Nov 2017
4Optimization
Forcing
Optimization
Presenter |
Papers |
Paper URL |
Our Slides |
Shijia |
Professor Forcing: A New Algorithm for Training Recurrent Networks, NIPS16 |
PDF + Video |
PDF |
Beilun+Arshdeep |
Mollifying Networks, Bengio, ICLR17 |
PDF |
PDF / PDF2 |
read on: - 07 Nov 2017
4Optimization
Architecture-Search
Hyperparameter
dynamic
Optimization
Presenter |
Papers |
Paper URL |
Our Slides |
GaoJi |
Forward and Reverse Gradient-Based Hyperparameter Optimization, ICML17 |
PDF |
PDF |
Chaojiang |
Adaptive Neural Networks for Efficient Inference, ICML17 |
PDF |
PDF |
Bargav |
Practical Gauss-Newton Optimisation for Deep Learning, ICML17 |
PDF |
PDF |
Rita |
How to Escape Saddle Points Efficiently, ICML17 |
PDF |
PDF |
|
Batched High-dimensional Bayesian Optimization via Structural Kernel Learning |
PDF |
|
read on: - 02 Nov 2017
4Optimization
Architecture Search
RL
Few-Shot
Optimization
Presenter |
Papers |
Paper URL |
Our Slides |
GaoJi |
Neural Architecture Search with Reinforcement Learning, ICLR17 |
PDF |
PDF |
Ceyer |
Learning to learn |
DLSS17video |
PDF |
Beilun |
Optimization as a Model for Few-Shot Learning, ICLR17 |
PDF + More |
PDF |
Anant |
Neural Optimizer Search with Reinforcement Learning, ICML17 |
PDF |
PDF |
read on: - 31 Oct 2017
4Optimization
optimization
Curriculum
Differentiation
Presenter |
Papers |
Paper URL |
Our Slides |
Ceyer |
An overview of gradient optimization algorithms, |
PDF |
PDF |
Shijia |
Osborne - Probabilistic numerics for deep learning |
DLSS 2017 + Video |
PDF / PDF2 |
Jack |
Automated Curriculum Learning for Neural Networks, ICML17 |
PDF |
PDF |
DLSS17 |
Johnson - Automatic Differentiation |
slide + video |
|
read on: - 26 Oct 2017
3Reliable
Adversarial-Examples
noise
Composition
robustness
Presenter |
Papers |
Paper URL |
Our Slides |
Tianlu |
Robustness of classifiers: from adversarial to random noise, NIPS16 |
PDF |
PDF |
Anant |
Blind Attacks on Machine Learners, NIPS16 |
PDF |
PDF |
|
Data Noising as Smoothing in Neural Network Language Models (Ng), ICLR17 |
pdf |
|
|
The Robustness of Estimator Composition, NIPS16 |
PDF |
|
read on: - 23 Oct 2017
3Reliable
Adversarial-Examples
high-dimensional
robustness
Presenter |
Papers |
Paper URL |
Our Slides |
GaoJi |
Delving into Transferable Adversarial Examples and Black-box Attacks,ICLR17 |
pdf |
PDF |
Shijia |
On Detecting Adversarial Perturbations, ICLR17 |
pdf |
PDF |
Anant |
Parseval Networks: Improving Robustness to Adversarial Examples, ICML17 |
pdf |
PDF |
Bargav |
Being Robust (in High Dimensions) Can Be Practical, ICML17 |
pdf |
PDF |
read on: - 17 Oct 2017
3Reliable
QA
noise
Neural-Programming
Hierarchical
Presenter |
Papers |
Paper URL |
Our Slides |
Jack |
Learning to Query, Reason, and Answer Questions On Ambiguous Texts, ICLR17 |
PDF |
PDF |
Arshdeep |
Making Neural Programming Architectures Generalize via Recursion, ICLR17 |
PDF |
PDF |
Xueying |
Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music, ICLR17 |
PDF |
PDF |
read on: - 12 Oct 2017
3Reliable
Semi-supervised
Privacy
Domain-adaptation
Presenter |
Papers |
Paper URL |
Our Slides |
Xueying |
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, ICLR17 |
PDF |
PDF |
Bargav |
Deep Learning with Differential Privacy, CCS16 |
PDF + video |
PDF |
Bargav |
Privacy-Preserving Deep Learning, CCS15 |
PDF |
PDF |
Xueying |
Domain Separation Networks, NIPS16 |
PDF |
PDF |
read on: - 11 Oct 2017
3Reliable
visualizing
Difference-Analysis
Attribution
Composition
Presenter |
Papers |
Paper URL |
Our Slides |
Rita |
Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, ICLR17 |
PDF |
PDF |
Arshdeep |
Axiomatic Attribution for Deep Networks, ICML17 |
PDF |
PDF |
|
The Robustness of Estimator Composition, NIPS16 |
PDF |
|
read on: - 10 Oct 2017
3Reliable
Interpretable
Model-Criticism
random
Difference-Analysis
Attribution
Presenter |
Papers |
Paper URL |
Our Slides |
Rita |
Learning Important Features Through Propagating Activation Differences, ICML17 |
PDF |
PDF |
GaoJi |
Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning, NIPS16 |
PDF |
PDF |
Rita |
Learning Kernels with Random Features, Aman Sinha*; John Duchi, |
PDF |
PDF |
read on: - 05 Oct 2017
2Architecture
8Scalable
dynamic
Architecture Search
structured
Presenter |
Papers |
Paper URL |
Our Slides |
Anant |
AdaNet: Adaptive Structural Learning of Artificial Neural Networks, ICML17 |
PDF |
PDF |
Shijia |
SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization, ICML17 |
PDF |
PDF |
Jack |
Proximal Deep Structured Models, NIPS16 |
PDF |
PDF |
|
Optimal Architectures in a Solvable Model of Deep Networks, NIPS16 |
PDF |
|
Tianlu |
Large-Scale Evolution of Image Classifiers, ICML17 |
PDF |
PDF |
read on: - 03 Oct 2017
2Architecture
dynamic
QA
memory
Presenter |
Papers |
Paper URL |
Our Slides |
Tianlu |
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, ICML17 |
PDF + code |
PDF |
Jack |
Reasoning with Memory Augmented Neural Networks for Language Comprehension, ICLR17 |
PDF |
PDF |
Xueying |
State-Frequency Memory Recurrent Neural Networks, ICML17 |
PDF |
PDF |
read on: - 28 Sep 2017
2Architecture
attention
transfer-learning
relational
generative
memory
Infomax
Presenter |
Papers |
Paper URL |
Our Slides |
Jack |
Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain, ICLR17 |
PDF |
PDF |
Arshdeep |
Bidirectional Attention Flow for Machine Comprehension, ICLR17 |
PDF + code |
PDF |
Ceyer |
Image-to-Markup Generation with Coarse-to-Fine Attention, ICML17 |
PDF + code |
PDF |
ChaoJiang |
Can Active Memory Replace Attention? ; Samy Bengio, NIPS16 |
PDF |
PDF |
|
An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax, ICLR17 |
PDF |
|
read on: - 26 Sep 2017
2Architecture
attention
transfer-learning
dynamic
structured
QA
relational
Presenter |
Papers |
Paper URL |
Our Slides |
Rita |
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR17 |
PDF |
PDF |
Tianlu |
Dynamic Coattention Networks For Question Answering, ICLR17 |
PDF + code |
PDF |
ChaoJiang |
Structured Attention Networks, ICLR17 |
PDF + code |
PDF |
read on: - 21 Sep 2017
2Architecture
8Scalable
sparsity
blocking
nonparametric
structured
QA
Interpretable
Presenter |
Papers |
Paper URL |
Our Slides |
Shijia |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, (Dean), ICLR17 |
PDF |
PDF |
Ceyer |
Sequence Modeling via Segmentations, ICML17 |
PDF |
PDF |
Arshdeep |
Input Switched Affine Networks: An RNN Architecture Designed for Interpretability, ICML17 |
PDF |
PDF |
read on: - 19 Sep 2017
2Architecture
8Scalable
dialog
QA
nonparametric
structured
sparsity
Presenter |
Papers |
Paper URL |
Our Slides |
Jack |
Learning End-to-End Goal-Oriented Dialog, ICLR17 |
PDF |
PDF |
Bargav |
Nonparametric Neural Networks, ICLR17 |
PDF |
PDF |
Bargav |
Learning Structured Sparsity in Deep Neural Networks, NIPS16 |
PDF |
PDF |
Arshdeep |
Learning the Number of Neurons in Deep Networks, NIPS16 |
PDF |
PDF |
read on: - 14 Sep 2017
1Theoretical
understanding
black-box
Expressive
generalization
Presenter |
Papers |
Paper URL |
Our Slides |
SE |
Equivariance Through Parameter-Sharing, ICML17 |
PDF |
|
SE |
Why Deep Neural Networks for Function Approximation?, ICLR17 |
PDF |
|
SE |
Geometry of Neural Network Loss Surfaces via Random Matrix Theory, ICML17 |
PDF |
|
|
Sharp Minima Can Generalize For Deep Nets, ICML17 |
PDF |
|
read on: - 12 Sep 2017
1Theoretical
understanding
black-box
Memorization
InfoMax
Expressive
Presenter |
Papers |
Paper URL |
Our Slides |
Ceyer |
A Closer Look at Memorization in Deep Networks, ICML17 |
PDF |
PDF |
|
On the Expressive Efficiency of Overlapping Architectures of Deep Learning |
DLSSpdf + video |
|
Mutual Information |
Opening the Black Box of Deep Neural Networks via Information |
URL + video |
|
ChaoJiang |
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity, NIPS16 |
PDF |
PDF |
read on: - 07 Sep 2017
1Theoretical
understanding
black-box
Parsimonious
Associative
memory
Presenter |
Papers |
Paper URL |
Our Slides |
Beilun |
Learning Deep Parsimonious Representations, NIPS16 |
PDF |
PDF |
Jack |
Dense Associative Memory for Pattern Recognition, NIPS16 |
PDF + video |
PDF |
read on: - 05 Sep 2017
1Theoretical
understanding
black-box
generalization
Expressive
Presenter |
Papers |
Paper URL |
Our Slides |
Rita |
On the Expressive Power of Deep Neural Networks |
PDF |
PDF |
Arshdeep |
Understanding deep learning requires rethinking generalization, ICLR17 |
PDF |
PDF |
Tianlu |
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, ICLR17 |
PDF |
PDF |
read on: - 31 Aug 2017
5Generative
0Basics
generative
GAN
Presenter |
Papers |
Paper URL |
Our Slides |
NIPS 2016 |
ganerative adversarial network tutorial (NIPS 2016) |
paper + video + code |
|
DLSS 2017 |
Generative Models I - DLSS 2017 |
slideraw + video + slide |
|
read on: - 29 Aug 2017
6Reinforcement
0Basics
RL
Pineau - RL Basic Concepts
read on: - 24 Aug 2017
1Theoretical
neuroscience
visualizing
brain
Ganguli - Theoretical Neuroscience and Deep Learning
read on: - 22 Aug 2017
0Basics
bias-variance
Presenter |
Papers |
Paper URL |
Our Slides |
NIPS16 |
Andrew Ng - Nuts and Bolts of Applying Deep Learning: video |
|
|
DLSS17 |
Doina Precup - Machine Learning - Bayesian Views (56:50m to 1:04:45 slides) video + slide |
|
|
> In total, we have finished number of
127 reading sessions.