- See Also
-
Links
- “Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data”, Tajwar et al 2024
- “Dataset Reset Policy Optimization for RLHF”, Chang et al 2024
- “Mastering Stacking of Diverse Shapes With Large-Scale Iterative Reinforcement Learning on Real Robots”, Lampe et al 2023
- “Vision-Language Models As a Source of Rewards”, Baumli et al 2023
- “Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReSTEM)”, Singh et al 2023
- “Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, Hong et al 2023
- “Course Correcting Koopman Representations”, Fathi et al 2023
- “Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions”, Chebotar et al 2023
- “Subwords As Skills: Tokenization for Sparse-Reward Reinforcement Learning”, Yunis et al 2023
- “What Are Dreams For? Converging Lines of Research Suggest That We Might Be Misunderstanding Something We Do Every Night of Our Lives”, Gefter 2023
- “ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
- “AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning”, Mathieu et al 2023
- “Learning to Model the World With Language”, Lin et al 2023
- “Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior”, Block et al 2023
- “PASTA: Pretrained Action-State Transformer Agents”, Boige et al 2023
- “Fighting Uncertainty With Gradients: Offline Reinforcement Learning via Diffusion Score Matching”, Suh et al 2023
- “Twitching in Sensorimotor Development from Sleeping Rats to Robots”, Blumberg et al 2023
- “Survival Instinct in Offline Reinforcement Learning”, Li et al 2023
- “BetaZero: Belief-State Planning for Long-Horizon POMDPs Using Learned Approximations”, Moss et al 2023
- “Improving Language Models With Advantage-Based Offline Policy Gradients”, Baheti et al 2023
- “Revisiting the Minimalist Approach to Offline Reinforcement Learning”, Tarasov et al 2023
- “Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Mezghani et al 2023
- “Off-The-Grid MARL (OG-MARL): Datasets With Baselines for Offline Multi-Agent Reinforcement Learning”, Formanek et al 2023
- “Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes”, Kumar et al 2022
- “Dungeons and Data: A Large-Scale NetHack Dataset”, Hambro et al 2022
- “In-Context Reinforcement Learning With Algorithm Distillation”, Laskin et al 2022
- “CORL: Research-Oriented Deep Offline Reinforcement Learning Library”, Tarasov et al 2022
- “Diffusion-QL: Diffusion Policies As an Expressive Policy Class for Offline Reinforcement Learning”, Wang et al 2022
- “Offline RL Policies Should Be Trained to Be Adaptive”, Ghosh et al 2022
- “Prompting Decision Transformer for Few-Shot Policy Generalization”, Xu et al 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
- “Large-Scale Retrieval for Reinforcement Learning”, Humphreys et al 2022
- “Offline RL for Natural Language Generation With Implicit Language Q Learning”, Snell et al 2022
- “When Does Return-Conditioned Supervised Learning Work for Offline Reinforcement Learning?”, Brandfonbrener et al 2022
- “Newton’s Method for Reinforcement Learning and Model Predictive Control”, Bertsekas 2022
- “You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments”, Paster et al 2022
- “Multi-Game Decision Transformers”, Lee et al 2022
- “When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?”, Kumar et al 2022
- “Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)”, Yarats et al 2022
- “Offline Pre-Trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks”, Meng et al 2021
- “A Workflow for Offline Model-Free Robotic Reinforcement Learning”, Kumar et al 2021
- “Conservative Objective Models for Effective Offline Model-Based Optimization”, Trabucco et al 2021
- “A Minimalist Approach to Offline Reinforcement Learning”, Fujimoto & Gu 2021
- “Is Pessimism Provably Efficient for Offline RL?”, Jin et al 2020
- “What Are the Statistical Limits of Offline RL With Linear Function Approximation?”, Wang et al 2020
- “MOPO: Model-Based Offline Policy Optimization”, Yu et al 2020
- “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, Levine et al 2020
- “D4RL: Datasets for Deep Data-Driven Reinforcement Learning”, Fu et al 2020
- “Q✱ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison”, Xie & Jiang 2020
- “Scaling Data-Driven Robotics With Reward Sketching and Batch Reinforcement Learning”, Cabi et al 2019
- “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”, Kalashnikov et al 2018
- “The Netflix Recommender System”, Gomez-Uribe & Hunt 2015
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Links
“Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data”, Tajwar et al 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
“Dataset Reset Policy Optimization for RLHF”, Chang et al 2024
“Mastering Stacking of Diverse Shapes With Large-Scale Iterative Reinforcement Learning on Real Robots”, Lampe et al 2023
“Vision-Language Models As a Source of Rewards”, Baumli et al 2023
“Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReSTEM)”, Singh et al 2023
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)
“Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, Hong et al 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
“Course Correcting Koopman Representations”, Fathi et al 2023
“Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions”, Chebotar et al 2023
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
“Subwords As Skills: Tokenization for Sparse-Reward Reinforcement Learning”, Yunis et al 2023
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning
“What Are Dreams For? Converging Lines of Research Suggest That We Might Be Misunderstanding Something We Do Every Night of Our Lives”, Gefter 2023
“ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
“AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning”, Mathieu et al 2023
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
“Learning to Model the World With Language”, Lin et al 2023
“Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior”, Block et al 2023
“PASTA: Pretrained Action-State Transformer Agents”, Boige et al 2023
“Fighting Uncertainty With Gradients: Offline Reinforcement Learning via Diffusion Score Matching”, Suh et al 2023
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
“Twitching in Sensorimotor Development from Sleeping Rats to Robots”, Blumberg et al 2023
Twitching in Sensorimotor Development from Sleeping Rats to Robots
“Survival Instinct in Offline Reinforcement Learning”, Li et al 2023
“BetaZero: Belief-State Planning for Long-Horizon POMDPs Using Learned Approximations”, Moss et al 2023
BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations
“Improving Language Models With Advantage-Based Offline Policy Gradients”, Baheti et al 2023
Improving Language Models with Advantage-based Offline Policy Gradients
“Revisiting the Minimalist Approach to Offline Reinforcement Learning”, Tarasov et al 2023
Revisiting the Minimalist Approach to Offline Reinforcement Learning
“Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Mezghani et al 2023
Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
“Off-The-Grid MARL (OG-MARL): Datasets With Baselines for Offline Multi-Agent Reinforcement Learning”, Formanek et al 2023
Off-the-Grid MARL (OG-MARL): Datasets with Baselines for Offline Multi-Agent Reinforcement Learning
“Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes”, Kumar et al 2022
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes
“Dungeons and Data: A Large-Scale NetHack Dataset”, Hambro et al 2022
“In-Context Reinforcement Learning With Algorithm Distillation”, Laskin et al 2022
In-context Reinforcement Learning with Algorithm Distillation
“CORL: Research-Oriented Deep Offline Reinforcement Learning Library”, Tarasov et al 2022
CORL: Research-oriented Deep Offline Reinforcement Learning Library
“Diffusion-QL: Diffusion Policies As an Expressive Policy Class for Offline Reinforcement Learning”, Wang et al 2022
Diffusion-QL: Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
“Offline RL Policies Should Be Trained to Be Adaptive”, Ghosh et al 2022
“Prompting Decision Transformer for Few-Shot Policy Generalization”, Xu et al 2022
Prompting Decision Transformer for Few-Shot Policy Generalization
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
“Large-Scale Retrieval for Reinforcement Learning”, Humphreys et al 2022
“Offline RL for Natural Language Generation With Implicit Language Q Learning”, Snell et al 2022
Offline RL for Natural Language Generation with Implicit Language Q Learning
“When Does Return-Conditioned Supervised Learning Work for Offline Reinforcement Learning?”, Brandfonbrener et al 2022
When does return-conditioned supervised learning work for offline reinforcement learning?
“Newton’s Method for Reinforcement Learning and Model Predictive Control”, Bertsekas 2022
Newton’s method for reinforcement learning and model predictive control
“You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments”, Paster et al 2022
You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments
“Multi-Game Decision Transformers”, Lee et al 2022
“When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?”, Kumar et al 2022
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
“Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)”, Yarats et al 2022
“Offline Pre-Trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks”, Meng et al 2021
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
“A Workflow for Offline Model-Free Robotic Reinforcement Learning”, Kumar et al 2021
A Workflow for Offline Model-Free Robotic Reinforcement Learning
“Conservative Objective Models for Effective Offline Model-Based Optimization”, Trabucco et al 2021
Conservative Objective Models for Effective Offline Model-Based Optimization
“A Minimalist Approach to Offline Reinforcement Learning”, Fujimoto & Gu 2021
“Is Pessimism Provably Efficient for Offline RL?”, Jin et al 2020
“What Are the Statistical Limits of Offline RL With Linear Function Approximation?”, Wang et al 2020
What are the Statistical Limits of Offline RL with Linear Function Approximation?
“MOPO: Model-Based Offline Policy Optimization”, Yu et al 2020
“Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, Levine et al 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
“D4RL: Datasets for Deep Data-Driven Reinforcement Learning”, Fu et al 2020
“Q✱ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison”, Xie & Jiang 2020
Q✱ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
“Scaling Data-Driven Robotics With Reward Sketching and Batch Reinforcement Learning”, Cabi et al 2019
Scaling data-driven robotics with reward sketching and batch reinforcement learning
“QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”, Kalashnikov et al 2018
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
“The Netflix Recommender System”, Gomez-Uribe & Hunt 2015
Wikipedia
Miscellaneous
-
https://bair.berkeley.edu/blog/2022/04/25/rl-or-bc/
:View External Link:
-
https://jacobbuckman.com/2020-11-30-conceptual-fundamentals-of-offline-rl/
-
https://netflixtechblog.com/learning-a-personalized-homepage-aa8ec670359a#1c3e
: -
https://proceedings.neurips.cc/paper/2014/file/8bb88f80d334b1869781beb89f7b73be-Paper.pdf
: -
https://sites.google.com/view/offlinerltutorial-neurips2020/home
Bibliography
-
https://arxiv.org/abs/2404.08495
: “Dataset Reset Policy Optimization for RLHF”, -
https://arxiv.org/abs/2312.06585#deepmind
: “Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReSTEM)”, -
https://arxiv.org/abs/2305.09836
: “Revisiting the Minimalist Approach to Offline Reinforcement Learning”, -
https://arxiv.org/abs/2206.13499
: “Prompting Decision Transformer for Few-Shot Policy Generalization”, -
https://arxiv.org/abs/2206.11795#openai
: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, -
https://arxiv.org/abs/2206.05314#deepmind
: “Large-Scale Retrieval for Reinforcement Learning”, -
https://arxiv.org/abs/2205.15241#google
: “Multi-Game Decision Transformers”, -
2015-gomezuribe.pdf
: “The Netflix Recommender System”,