Papers |
Paper URL |
Abstract |
A Generalist Agent |
URL |
Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. |
Why should we prefer offline reinforcement learning over behavioral cloning? ICLR 2022 |
URL |
natural to ask: when can an offline RL method outperform BC with an equal amount of expert data, even when BC is a natural choice? |
Uni[MASK]: Unified Inference in Sequential Decision Problems |
URL |
show how sequential decision making tasks can be thought of in terms of corresponding input maskings, enabling the training of a single model to perform all tasks at once. applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. |