Dr. Yanjun Qi

Here are Papers I Reviewed. I am science curious.

Reviews Indexed

Toggle Menu

Index
Recent Posts By GenAI Category
- FM Basic
- FM Adapt
- FM Risk
- FM Multi
- FM Efficiency
Past Posts By DNN Category
Basic DNN Reads
- BasicDeep
- BasicML

6Reinforcement

Recent Readings for RL and Deep RL (since 2017) (Index of Posts):

No.	Read Date	Title and Information	We Read @
1	2022, Dec, 3	RLHF + InstructGPT	2022-W6
2	2022, Jun, 3	Decision Transformers	2022-W3
3	2022, May, 3	A Generalist Agent + offline RL + UniMask	2022-W1
4	2020, Mar, 5	Deep Reinforcement Learning	2020-W3
5	2019, Dec, 8	deep2reproduce 2019 Fall - 6Reinforcement papers	2019-fall Students deep2reproduce
6	2018, Aug, 13	Application18- DNNs in a Few BioMedical Tasks	2018-team
7	2018, Aug, 3	Reliable18- Testing and Verifying DNNs	2018-team
8	2017, Nov, 30	RL IV - RL with varying structures	2017-W15
9	2017, Nov, 28	RL III - Basic tutorial RLSS17 (2)	2017-W14
10	2017, Nov, 21	RL II - Basic tutorial RLSS17	2017-W14
11	2017, Aug, 29	Reinforcement I - Pineau - RL Basic Concepts	2017-W2

Here is a detailed list of posts!

[1]: RLHF + InstructGPT

read on: - 03 Dec 2022
RL AGI language model Human Alignment

Papers	Paper URL	Abstract
Training language models to follow instructions with human feedback	URL	“further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.”
Deep reinforcement learning from human preferences	URL	“explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function”

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
https://arxiv.org/abs/2106.01345
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

Prompting Decision Transformer for Few-Shot Policy Generalization

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
https://arxiv.org/abs/2206.13499
Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

[3]: A Generalist Agent + offline RL + UniMask

read on: - 03 May 2022
RL AGI

Papers

Paper URL

Abstract

A Generalist Agent

URL

Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

Why should we prefer offline reinforcement learning over behavioral cloning? ICLR 2022

URL

natural to ask: when can an offline RL method outperform BC with an equal amount of expert data, even when BC is a natural choice?

Uni[MASK]: Unified Inference in Sequential Decision Problems

URL

show how sequential decision making tasks can be thought of in terms of corresponding input maskings, enabling the training of a single model to perform all tasks at once. applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns.

[4]: Deep Reinforcement Learning

read on: - 05 Mar 2020
RL Generalization

Index	Papers	Our Slides
1	Actor-Critic Methods for Control	Jake Survey
2	Generalization in Deep Reinforcement Learning	Jake Survey
3	Sample Efficient RL (Part 1)	Jake Survey
4	Sample Efficient RL (Part 2)	Jake Survey
5	Model-Free Value Methods in Deep RL	Jake Survey
6	Investigating Human Priors for Playing Video Games	Arsh Survey

[5]: deep2reproduce 2019 Fall - 6Reinforcement papers

read on: - 08 Dec 2019
verification RL

Team INDEX	Title & Link	Tags	Our Slide
T1	Safe Reinforcement Learning via Shielding	RL, safety, verification	OurSlide

[6]: Application18- DNNs in a Few BioMedical Tasks

read on: - 13 Aug 2018
brain RNA DNA Genomics generative

Presenter	Papers	Paper URL	Our Slides
Arshdeep	DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning.	PDF	PDF
Arshdeep	Solving the RNA design problem with reinforcement learning, PLOSCB ¹	PDF	PDF
Arshdeep	Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk ²	PDF	PDF
Arshdeep	Towards Gene Expression Convolutions using Gene Interaction Graphs, Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio ³	PDF	PDF
Brandon	Kipoi: Accelerating the Community Exchange and Reuse of Predictive Models for Genomics	PDF	PDF
Arshdeep	Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions ²	PDF	PDF

[7]: Reliable18- Testing and Verifying DNNs

read on: - 03 Aug 2018
RL Fuzzing Adversarial-Examples verification software-testing black-box white-box

Presenter	Papers	Paper URL	Our Slides
GaoJi	Deep Reinforcement Fuzzing, Konstantin Böttinger, Patrice Godefroid, Rishabh Singh	PDF	PDF
GaoJi	Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks, Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer	PDF	PDF
GaoJi	DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray	PDF	PDF
GaoJi	A few Recent (2018) papers on Black-box Adversarial Attacks, like Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors ¹	PDF	PDF
GaoJi	A few Recent papers of Adversarial Attacks on reinforcement learning, like Adversarial Attacks on Neural Network Policies (Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel)	PDF	PDF
Testing	DeepXplore: Automated Whitebox Testing of Deep Learning Systems	PDF

[8]: RL IV - RL with varying structures

read on: - 30 Nov 2017
Auxiliary Sampling Value-Networks structured Imitation-Learning Hierarchical

Presenter	Papers	Paper URL	Our Slides
Ceyer	Reinforcement Learning with Unsupervised Auxiliary Tasks, ICLR17 ¹	PDF	PDF
Beilun	Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband, Benjamin Van Roy ²	PDF	PDF
Ji	Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, ICML17 ³	PDF	PDF
Xueying	End-to-End Differentiable Adversarial Imitation Learning, ICML17 ⁴	PDF	PDF
	Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, ICML17	PDF
	FeUdal Networks for Hierarchical Reinforcement Learning, ICML17 ⁵	PDF

[9]: RL III - Basic tutorial RLSS17 (2)

read on: - 28 Nov 2017
alphaGO Planning Temporal-Difference

Presenter	Papers	Paper URL	Our Slides
Anant	The Predictron: End-to-End Learning and Planning, ICLR17 ¹	PDF	PDF
ChaoJiang	Szepesvari - Theory of RL ²	RLSS.pdf + Video	PDF
GaoJi	Mastering the game of Go without human knowledge / Nature 2017 ³	PDF	PDF
	Thomas - Safe Reinforcement Learning	RLSS17.pdf + video
	Sutton - Temporal-Difference Learning	RLSS17.pdf + Video

[10]: RL II - Basic tutorial RLSS17

read on: - 21 Nov 2017
RL Multi-Task

Presenter	Papers	Paper URL	Our Slides
Jack	Hasselt - Deep Reinforcement Learning	RLSS17.pdf + video	PDF
Tianlu	Roux - RL in the Industry	RLSS17.pdf + video	PDF / PDF-Bandit
Xueying	Singh - Steps Towards Continual Learning	pdf + video	PDF
GaoJi	Distral: Robust Multitask Reinforcement Learning ¹	PDF	PDF

[11]: Reinforcement I - Pineau - RL Basic Concepts

read on: - 29 Aug 2017
RL

Pineau - RL Basic Concepts

Presenter	Papers	Paper URL	Our Slides
DLSS16	video
RLSS17	slideRaw + video+ slide

Here is a name list of posts!

BackTop