Dr. Yanjun Qi

Here are Papers I Reviewed. I am science curious.

Reviews Indexed

Toggle Menu

Index
Recent Posts By GenAI Category
- FM Basic
- FM Adapt
- FM Risk
- FM Multi
- FM Efficiency
Past Posts By DNN Category
Basic DNN Reads
- BasicDeep
- BasicML

FMBasic

Recent Readings for Basic Topics of Foundation Models (since 2022) (Index of Posts):

No.	Read Date	Title and Information	We Read @
1	2024, Feb, 8	Open Source LLM - Mistral Data preparation	2024-S6
2	2024, Feb, 6	Survey human alignment	2024-S5
3	2024, Jan, 30	LLM evaluating framework	2024-S3
4	2024, Jan, 23	LLM basics	2024-S1
5	2024, Jan, 18	NLP Basics Introduction	2024-S0
6	2022, Dec, 3	RLHF + InstructGPT	2022-W6
7	2022, Dec, 1	Stable diffusion + DreamBooth + LoRA	2022-W5
8	2022, Oct, 1	Emergent Abilities of LLM	2022-W4

Here is a detailed list of posts!

[1]: Open Source LLM - Mistral Data preparation

read on: - 08 Feb 2024
BasicLLM

In this session, our readings cover:

Required Readings:

Mistral 7B

https://mistral.ai/news/announcing-mistral-7b/
We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B – Instruct, that surpasses the Llama 2 13B – Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

More Readings:

OLMo: Accelerating the Science of Language Models

https://arxiv.org/abs/2402.00838

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, this technical report details the first release of OLMo, a state-of-the-art, truly Open Language Model and its framework to build and study the science of language modeling. Unlike most prior efforts that have only released model weights and inference code, we release OLMo and the whole framework, including training data and training and evaluation code. We hope this release will empower and strengthen the open research community and inspire a new wave of innovation.

Mixtral of Experts

https://arxiv.org/abs/2401.04088
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.

- Llama 2: Open Foundation and Fine-Tuned Chat Models

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

https://arxiv.org/abs/2101.00027
Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets – both existing and newly constructed – many of which derive from academic or professional sources. Our evaluation of the untuned performance of GPT-2 and GPT-3 on the Pile shows that these models struggle on many of its components, such as academic writing. Conversely, models trained on the Pile improve significantly over both Raw CC and CC-100 on all components of the Pile, while improving performance on downstream evaluations. Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction.

[2]: Survey human alignment

read on: - 06 Feb 2024
Alignment

In this session, our readings cover:

Required Readings:

Aligning Large Language Models with Human: A Survey

https://arxiv.org/abs/2307.12966
https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo
https://huggingface.co/blog/stackllama

[3]: LLM evaluating framework

read on: - 30 Jan 2024
LLMEvaluate

In this session, our readings cover:

Required Readings:

Holistic Evaluation of Text-To-Image Models

https://arxiv.org/abs/2311.04287
The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at this https URL and the code at this https URL, which is integrated with the HELM codebase.

Holistic Evaluation of Language Models

https://arxiv.org/abs/2211.09110

[4]: LLM basics

read on: - 23 Jan 2024
BasicLLM

Required Readings:

Emergent Abilities of Large Language Models

“an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”

Language Models are Few-Shot Learners

“GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”

Extra Readings:

A survey of Generative AI Applications

https://arxiv.org/abs/2306.02781
Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.

Generative AI: Perspectives from Stanford HAI

https://hai.stanford.edu/generative-ai-perspectives-stanford-hai

[5]: NLP Basics Introduction

read on: - 18 Jan 2024
BasicLLM

Readings:

Basics of ML and DL:

Basics of NLP

URL
Typical NLP tasks / Challenges / Pipeline
f() on natural language
- Before Deep NLP (Pre 2012) • (BOW / LSI / Topic Modeling LDA )
- Word2Vec (2013-2016) • (GloVe/ FastText)
- Recurrent NN (2014-2016) • LSTM
- Seq2Seq
- Attention
- Self-Attention (2016 – now )
- Transformer (attention only Seq2Seq)
- BERT / RoBERTa/ XLNet/ GPT / …
A good code walk through on transformer at URL

[6]: RLHF + InstructGPT

read on: - 03 Dec 2022
RL AGI language model Human Alignment

Papers	Paper URL	Abstract
Training language models to follow instructions with human feedback	URL	“further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences	URL	“explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”

[7]: Stable diffusion + DreamBooth + LoRA

read on: - 01 Dec 2022
Diffusion Image synthesis Efficiency

Stable diffusion

URL
“High-Resolution Image Synthesis with Latent Diffusion Models”

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

URL
“personalization” of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. .”

LoRA: Low-Rank Adaptation of Large Language Models

“propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.”

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

https://arxiv.org/abs/2208.01618
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks.

[8]: Emergent Abilities of LLM

read on: - 01 Oct 2022
language model

Emergent Abilities of Large Language Models

URL
“an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”

Language Models are Few-Shot Learners

URL
“GPT-3, 175B autoregerssive LLM; show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.”

On the Opportunities and Risks of Foundation Models

” a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations).”

The Power of Scale for Parameter-Efficient Prompt Tuning

https://arxiv.org/abs/2104.08691
Brian Lester, Rami Al-Rfou, Noah Constant
In this work, we explore “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s “few-shot” learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method “closes the gap” and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed “prefix tuning” of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

Here is a name list of posts!

BackTop

FMBasic

Recent Readings for Basic Topics of Foundation Models (since 2022) (Index of Posts):

Here is a detailed list of posts!

[1]: Open Source LLM - Mistral Data preparation

Required Readings:

Mistral 7B

More Readings:

OLMo: Accelerating the Science of Language Models

Mixtral of Experts

- Llama 2: Open Foundation and Fine-Tuned Chat Models

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

[2]: Survey human alignment

Required Readings:

Aligning Large Language Models with Human: A Survey

More readings

Github Awesome-RLHF

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

DPO Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Training language models to follow instructions with human feedback

Deep reinforcement learning from human preferences

[3]: LLM evaluating framework

Required Readings:

Holistic Evaluation of Text-To-Image Models

Holistic Evaluation of Language Models

More Readings:

Challenges in evaluating AI systems

Evaluating Large Language Models: A Comprehensive Survey

Evaluating Large Language Models Trained on Code

chatbot-arena-leaderboard

Leveraging Large Language Models for NLG Evaluation: A Survey

[4]: LLM basics

Required Readings:

Emergent Abilities of Large Language Models

Language Models are Few-Shot Learners

Extra Readings:

A survey of Generative AI Applications

Generative AI: Perspectives from Stanford HAI

[5]: NLP Basics Introduction

Readings:

Basics of ML and DL:

Basics of NLP

[6]: RLHF + InstructGPT

[7]: Stable diffusion + DreamBooth + LoRA

Stable diffusion

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

LoRA: Low-Rank Adaptation of Large Language Models

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

[8]: Emergent Abilities of LLM

Emergent Abilities of Large Language Models

Language Models are Few-Shot Learners

On the Opportunities and Risks of Foundation Models

The Power of Scale for Parameter-Efficient Prompt Tuning

Here is a name list of posts!