RLHF + InstructGPT
Papers | Paper URL | Abstract |
---|---|---|
Training language models to follow instructions with human feedback | URL | “further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT.” |
Deep reinforcement learning from human preferences | URL | “explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function” |