- See Also
- Gwern
-
Links
- “Data Scaling Laws in Imitation Learning for Robotic Manipulation”, Lin et al 2024
- “Motor Physics: Safety Implications of Geared Motors”, Jang 2024
- “GUI-WORLD: A Dataset for GUI-Oriented Multimodal LLM-Based Agents”, Chen et al 2024
- “Earnings Call: Tesla Discusses Q1 2024 Challenges and AI Expansion”, Abdulkadir 2024
- “Beyond A✱: Better Planning With Transformers via Search Dynamics Bootstrapping (Searchformer)”, Lehnert et al 2024
- “Grandmaster-Level Chess Without Search”, Ruoss et al 2024
- “Mobile ALOHA: Learning Bimanual Mobile Manipulation With Low-Cost Whole-Body Teleoperation”, Fu et al 2024
- “Vision-Language Models As a Source of Rewards”, Baumli et al 2023
- “Self-Supervised Behavior Cloned Transformers Are Path Crawlers for Text Games”, Wang & Jansen 2023
- “Learning Few-Shot Imitation As Cultural Transmission”, Bhoopchand et al 2023
- “Calibrated Language Models Must Hallucinate”, Kalai & Vempala 2023
- “Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero”, Schut et al 2023
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
- “AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning”, Mathieu et al 2023
- “Getting from Generative AI to Trustworthy AI: What LLMs Might Learn from Cyc”, Lenat & Marcus 2023
- “Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior”, Block et al 2023
- “Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
- “GKD: Generalized Knowledge Distillation for Auto-Regressive Sequence Models”, Agarwal et al 2023
- “ChessGPT: Bridging Policy Learning and Language Modeling”, Feng et al 2023
- “SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling With Backtracking”, Cundy & Ermon 2023
- “Survival Instinct in Offline Reinforcement Learning”, Li et al 2023
- “Thought Cloning: Learning to Think While Acting by Imitating Human Thinking”, Hu & Clune 2023
- “Let’s Verify Step by Step”, Lightman et al 2023
- “The False Promise of Imitating Proprietary LLMs”, Gudibande et al 2023
- “LIMA: Less Is More for Alignment”, Zhou et al 2023
- “Revisiting the Minimalist Approach to Offline Reinforcement Learning”, Tarasov et al 2023
- “ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, Zhao et al 2023
- “MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, Wang et al 2023
- “Toolformer: Language Models Can Teach Themselves to Use Tools”, Schick et al 2023
- “Conditioning Predictive Models: Risks and Strategies”, Hubinger et al 2023
- “Imitating Human Behavior With Diffusion Models”, Pearce et al 2023
- “Solving Math Word Problems With Process & Outcome-Based Feedback”, Uesato et al 2022
- “CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, Bakhtin et al 2022
- “Token Turing Machines”, Ryoo et al 2022
- “Dungeons and Data: A Large-Scale NetHack Dataset”, Hambro et al 2022
- “In-Context Reinforcement Learning With Algorithm Distillation”, Laskin et al 2022
- “Scaling Laws for Reward Model Overoptimization”, Gao et al 2022
- “Human-AI Coordination via Human-Regularized Search and Learning”, Hu et al 2022
- “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Ramamurthy et al 2022
- “Nearest Neighbor Non-Autoregressive Text Generation”, Niwa et al 2022
- “Generative Personas That Behave and Experience Like Humans”, Barthet et al 2022
- “Diffusion-QL: Diffusion Policies As an Expressive Policy Class for Offline Reinforcement Learning”, Wang et al 2022
- “Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022
- “Improved Policy Optimization for Online Imitation Learning”, Lavington et al 2022
- “Watch and Match: Supercharging Imitation With Regularized Optimal Transport”, Haldar et al 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
- “Large-Scale Retrieval for Reinforcement Learning”, Humphreys et al 2022
- “Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
- “When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?”, Kumar et al 2022
- “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Ramrakhya et al 2022
- “Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-Time Planning”, Qi et al 2022
- “Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning”, Valassakis et al 2022
- “Inferring Rewards from Language in Context”, Lin et al 2022
- “Robot Peels Banana With Goal-Conditioned Dual-Action Deep Imitation Learning”, Kim et al 2022
- “The Unsurprising Effectiveness of Pre-Trained Vision Models for Control”, Parisi et al 2022
- “VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, Borja-Diaz et al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
- “Conditional Imitation Learning for Multi-Agent Games”, Shih et al 2022
- “Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
- “WebGPT: Browser-Assisted Question-Answering With Human Feedback”, Nakano et al 2021
- “Modeling Strong and Human-Like Gameplay With KL-Regularized Search”, Jacob et al 2021
- “JueWu-MC: Playing Minecraft With Sample-Efficient Hierarchical Reinforcement Learning”, Lin et al 2021
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, Lu et al 2021
- “RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning”, Ramos et al 2021
- “BC-Z: Zero-Shot Task Generalization With Robotic Imitation Learning”, Jang et al 2021
- “Is Bang-Bang Control All You Need? Solving Continuous Control With Bernoulli Policies”, Seyde et al 2021
- “SafetyNet: Safe Planning for Real-World Self-Driving Vehicles Using Machine-Learned Policies”, Vitelli et al 2021
- “TrufLL: Learning Natural Language Generation from Scratch”, Donati et al 2021
- “Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
- “Learning to Navigate Sidewalks in Outdoor Environments”, Sorokin et al 2021
- “PlaTe: Visually-Grounded Planning With Transformers in Procedural Tasks”, Sun et al 2021
- “Implicit Behavioral Cloning”, Florence et al 2021
- “DexMV: Imitation Learning for Dexterous Manipulation from Human Videos”, Qin et al 2021
- “Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs”, Sonnerat et al 2021
- “A Minimalist Approach to Offline Reinforcement Learning”, Fujimoto & Gu 2021
- “Hyperparameter Selection for Imitation Learning”, Hussenot et al 2021
- “From Motor Control to Team Play in Simulated Humanoid Football”, Liu et al 2021
- “On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Vischer et al 2021
- “Counter-Strike Deathmatch With Large-Scale Behavioral Cloning”, Pearce & Zhu 2021
- “Fully General Online Imitation Learning”, Cohen et al 2021
- “The MineRL 2020 Competition on Sample Efficient Reinforcement Learning Using Human Priors”, Guss et al 2021
- “Meta Learning Backpropagation And Improving It”, Kirsch & Schmidhuber 2020
- “SCC: an Efficient Deep Reinforcement Learning Agent Mastering the Game of StarCraft II”, Wang et al 2020
- “Imitating Interactive Intelligence”, Abramson et al 2020
- “TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game”, Han et al 2020
- “RetinaGAN: An Object-Aware Approach to Sim-To-Real Transfer”, Ho et al 2020
- “Emergent Social Learning via Multi-Agent Reinforcement Learning”, Ndousse et al 2020
- “Automatic Discovery of Interpretable Planning Strategies”, Skirzyński et al 2020
- “Learning Agile Robotic Locomotion Skills by Imitating Animals”, Peng et al 2020
- “Reinforcement Learning for Combinatorial Optimization: A Survey”, Mazyavkina et al 2020
- “Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, Brown et al 2020
- “AI Helps Warehouse Robots Pick Up New Tricks: Backed by Machine Learning Luminaries, Covariant.ai’s Bots Can Handle Jobs Previously Needing a Human Touch”, Knight 2020
- “Deep Bayesian Reward Learning from Preferences”, Brown & Niekum 2019
- “Learning Norms from Stories: A Prior for Value Aligned Agents”, Frazier et al 2019
- “Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?”, Du et al 2019
- “Learning to Reason in Large Theories without Imitation”, Bansal et al 2019
- “The MineRL 2019 Competition on Sample Efficient Reinforcement Learning Using Human Priors”, Guss et al 2019
- “Go-Explore: a New Approach for Hard-Exploration Problems”, Ecoffet et al 2019
- “Hierarchical Reinforcement Learning for Multi-Agent MOBA Game”, Zhang et al 2019
- “ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst”, Bansal et al 2018
- “Reward Learning from Human Preferences and Demonstrations in Atari”, Ibarz et al 2018
- “Language GANs Falling Short”, Caccia et al 2018
- “Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow”, Peng et al 2018
- “Human-Like Playtesting With Deep Learning”, Gudmundsson et al 2018
- “Convergence of Value Aggregation for Imitation Learning”, Cheng & Boots 2018
- “Policy Optimization by Genetic Distillation”, Gangwani & Peng 2017
- “Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks”, Sabatelli 2017 (page 3)
- “DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, Menda et al 2017
- “One-Shot Visual Imitation Learning via Meta-Learning”, Finn et al 2017
- “Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration”, Rahmatizadeh et al 2017
- “Learning Human Behaviors from Motion Capture by Adversarial Imitation”, Merel et al 2017
- “Grammatical Error Correction With Neural Reinforcement Learning”, Sakaguchi et al 2017
- “Path Integral Networks: End-To-End Differentiable Optimal Control”, Okada et al 2017
- “Gated-Attention Architectures for Task-Oriented Language Grounding”, Chaplot et al 2017
- “Visual Semantic Planning Using Deep Successor Representations”, Zhu et al 2017
- “A Deep Reinforced Model for Abstractive Summarization”, Paulus et al 2017
- “One-Shot Imitation Learning”, Duan et al 2017
- “Model-Based Adversarial Imitation Learning”, Baram et al 2016
- “A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models”, Finn et al 2016
- “SeqGAN: Sequence Generative Adversarial Nets With Policy Gradient”, Yu et al 2016
- “Generative Adversarial Imitation Learning”, Ho & Ermon 2016
- “Mastering the Game of Go With Deep Neural Networks and Tree Search”, Silver et al 2016
- “An Invitation to Imitation”, Bagnell 2015
- “DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Ross et al 2010
- “The Hidden Structure of Overimitation”, Lyons et al 2007
- “Google DeepMind’s Grandmaster-Level Chess Without Search”
- “Language Models Model Us”
- “Sony’s Racing Car AI Just Destroyed Its Human Competitors—By Being Nice (and Fast)”
- Miscellaneous
- Bibliography
See Also
Gwern
“GPT-3 Creative Fiction”, Gwern 2020
“The Scaling Hypothesis”, Gwern 2020
Links
“Data Scaling Laws in Imitation Learning for Robotic Manipulation”, Lin et al 2024
Data Scaling Laws in Imitation Learning for Robotic Manipulation
“Motor Physics: Safety Implications of Geared Motors”, Jang 2024
“GUI-WORLD: A Dataset for GUI-Oriented Multimodal LLM-Based Agents”, Chen et al 2024
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
“Earnings Call: Tesla Discusses Q1 2024 Challenges and AI Expansion”, Abdulkadir 2024
Earnings call: Tesla Discusses Q1 2024 Challenges and AI Expansion
“Beyond A✱: Better Planning With Transformers via Search Dynamics Bootstrapping (Searchformer)”, Lehnert et al 2024
Beyond A✱: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)
“Grandmaster-Level Chess Without Search”, Ruoss et al 2024
“Mobile ALOHA: Learning Bimanual Mobile Manipulation With Low-Cost Whole-Body Teleoperation”, Fu et al 2024
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
“Vision-Language Models As a Source of Rewards”, Baumli et al 2023
“Self-Supervised Behavior Cloned Transformers Are Path Crawlers for Text Games”, Wang & Jansen 2023
Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
“Learning Few-Shot Imitation As Cultural Transmission”, Bhoopchand et al 2023
“Calibrated Language Models Must Hallucinate”, Kalai & Vempala 2023
“Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero”, Schut et al 2023
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
“AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning”, Mathieu et al 2023
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
“Getting from Generative AI to Trustworthy AI: What LLMs Might Learn from Cyc”, Lenat & Marcus 2023
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc
“Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior”, Block et al 2023
“Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
“GKD: Generalized Knowledge Distillation for Auto-Regressive Sequence Models”, Agarwal et al 2023
GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
“ChessGPT: Bridging Policy Learning and Language Modeling”, Feng et al 2023
“SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling With Backtracking”, Cundy & Ermon 2023
SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling with Backtracking
“Survival Instinct in Offline Reinforcement Learning”, Li et al 2023
“Thought Cloning: Learning to Think While Acting by Imitating Human Thinking”, Hu & Clune 2023
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
“Let’s Verify Step by Step”, Lightman et al 2023
“The False Promise of Imitating Proprietary LLMs”, Gudibande et al 2023
“LIMA: Less Is More for Alignment”, Zhou et al 2023
“Revisiting the Minimalist Approach to Offline Reinforcement Learning”, Tarasov et al 2023
Revisiting the Minimalist Approach to Offline Reinforcement Learning
“ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, Zhao et al 2023
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
“MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, Wang et al 2023
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
“Toolformer: Language Models Can Teach Themselves to Use Tools”, Schick et al 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
“Conditioning Predictive Models: Risks and Strategies”, Hubinger et al 2023
“Imitating Human Behavior With Diffusion Models”, Pearce et al 2023
“Solving Math Word Problems With Process & Outcome-Based Feedback”, Uesato et al 2022
Solving math word problems with process & outcome-based feedback
“CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, Bakhtin et al 2022
“Token Turing Machines”, Ryoo et al 2022
“Dungeons and Data: A Large-Scale NetHack Dataset”, Hambro et al 2022
“In-Context Reinforcement Learning With Algorithm Distillation”, Laskin et al 2022
In-context Reinforcement Learning with Algorithm Distillation
“Scaling Laws for Reward Model Overoptimization”, Gao et al 2022
“Human-AI Coordination via Human-Regularized Search and Learning”, Hu et al 2022
Human-AI Coordination via Human-Regularized Search and Learning
“Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Ramamurthy et al 2022
“Nearest Neighbor Non-Autoregressive Text Generation”, Niwa et al 2022
“Generative Personas That Behave and Experience Like Humans”, Barthet et al 2022
“Diffusion-QL: Diffusion Policies As an Expressive Policy Class for Offline Reinforcement Learning”, Wang et al 2022
Diffusion-QL: Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
“Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022
Limitations of Language Models in Arithmetic and Symbolic Induction
“Improved Policy Optimization for Online Imitation Learning”, Lavington et al 2022
“Watch and Match: Supercharging Imitation With Regularized Optimal Transport”, Haldar et al 2022
Watch and Match: Supercharging Imitation with Regularized Optimal Transport
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
“Large-Scale Retrieval for Reinforcement Learning”, Humphreys et al 2022
“Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
“Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
Housekeep: Tidying Virtual Households using Commonsense Reasoning
“When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?”, Kumar et al 2022
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
“Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Ramrakhya et al 2022
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
“Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-Time Planning”, Qi et al 2022
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning
“Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning”, Valassakis et al 2022
“Inferring Rewards from Language in Context”, Lin et al 2022
“Robot Peels Banana With Goal-Conditioned Dual-Action Deep Imitation Learning”, Kim et al 2022
Robot peels banana with goal-conditioned dual-action deep imitation learning
“The Unsurprising Effectiveness of Pre-Trained Vision Models for Control”, Parisi et al 2022
The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
“VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, Borja-Diaz et al 2022
VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning
“LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
LID: Pre-Trained Language Models for Interactive Decision-Making
“Conditional Imitation Learning for Multi-Agent Games”, Shih et al 2022
“Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
“WebGPT: Browser-Assisted Question-Answering With Human Feedback”, Nakano et al 2021
WebGPT: Browser-assisted question-answering with human feedback
“Modeling Strong and Human-Like Gameplay With KL-Regularized Search”, Jacob et al 2021
Modeling Strong and Human-Like Gameplay with KL-Regularized Search
“JueWu-MC: Playing Minecraft With Sample-Efficient Hierarchical Reinforcement Learning”, Lin et al 2021
JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, Lu et al 2021
AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale
“RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning”, Ramos et al 2021
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
“BC-Z: Zero-Shot Task Generalization With Robotic Imitation Learning”, Jang et al 2021
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
“Is Bang-Bang Control All You Need? Solving Continuous Control With Bernoulli Policies”, Seyde et al 2021
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies
“SafetyNet: Safe Planning for Real-World Self-Driving Vehicles Using Machine-Learned Policies”, Vitelli et al 2021
SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies
“TrufLL: Learning Natural Language Generation from Scratch”, Donati et al 2021
“Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
“Learning to Navigate Sidewalks in Outdoor Environments”, Sorokin et al 2021
“PlaTe: Visually-Grounded Planning With Transformers in Procedural Tasks”, Sun et al 2021
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
“Implicit Behavioral Cloning”, Florence et al 2021
“DexMV: Imitation Learning for Dexterous Manipulation from Human Videos”, Qin et al 2021
DexMV: Imitation Learning for Dexterous Manipulation from Human Videos
“Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs”, Sonnerat et al 2021
Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs
“A Minimalist Approach to Offline Reinforcement Learning”, Fujimoto & Gu 2021
“Hyperparameter Selection for Imitation Learning”, Hussenot et al 2021
“From Motor Control to Team Play in Simulated Humanoid Football”, Liu et al 2021
From Motor Control to Team Play in Simulated Humanoid Football
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Vischer et al 2021
On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
“Counter-Strike Deathmatch With Large-Scale Behavioral Cloning”, Pearce & Zhu 2021
Counter-Strike Deathmatch with Large-Scale Behavioral Cloning
“Fully General Online Imitation Learning”, Cohen et al 2021
“The MineRL 2020 Competition on Sample Efficient Reinforcement Learning Using Human Priors”, Guss et al 2021
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors
“Meta Learning Backpropagation And Improving It”, Kirsch & Schmidhuber 2020
“SCC: an Efficient Deep Reinforcement Learning Agent Mastering the Game of StarCraft II”, Wang et al 2020
SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
“Imitating Interactive Intelligence”, Abramson et al 2020
“TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game”, Han et al 2020
“RetinaGAN: An Object-Aware Approach to Sim-To-Real Transfer”, Ho et al 2020
“Emergent Social Learning via Multi-Agent Reinforcement Learning”, Ndousse et al 2020
Emergent Social Learning via Multi-agent Reinforcement Learning
“Automatic Discovery of Interpretable Planning Strategies”, Skirzyński et al 2020
“Learning Agile Robotic Locomotion Skills by Imitating Animals”, Peng et al 2020
Learning Agile Robotic Locomotion Skills by Imitating Animals
“Reinforcement Learning for Combinatorial Optimization: A Survey”, Mazyavkina et al 2020
Reinforcement Learning for Combinatorial Optimization: A Survey
“Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, Brown et al 2020
Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
“AI Helps Warehouse Robots Pick Up New Tricks: Backed by Machine Learning Luminaries, Covariant.ai’s Bots Can Handle Jobs Previously Needing a Human Touch”, Knight 2020
“Deep Bayesian Reward Learning from Preferences”, Brown & Niekum 2019
“Learning Norms from Stories: A Prior for Value Aligned Agents”, Frazier et al 2019
Learning Norms from Stories: A Prior for Value Aligned Agents
“Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?”, Du et al 2019
Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
“Learning to Reason in Large Theories without Imitation”, Bansal et al 2019
“The MineRL 2019 Competition on Sample Efficient Reinforcement Learning Using Human Priors”, Guss et al 2019
The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors
“Go-Explore: a New Approach for Hard-Exploration Problems”, Ecoffet et al 2019
“Hierarchical Reinforcement Learning for Multi-Agent MOBA Game”, Zhang et al 2019
Hierarchical Reinforcement Learning for Multi-agent MOBA Game
“ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst”, Bansal et al 2018
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
“Reward Learning from Human Preferences and Demonstrations in Atari”, Ibarz et al 2018
Reward learning from human preferences and demonstrations in Atari
“Language GANs Falling Short”, Caccia et al 2018
“Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow”, Peng et al 2018
“Human-Like Playtesting With Deep Learning”, Gudmundsson et al 2018
“Convergence of Value Aggregation for Imitation Learning”, Cheng & Boots 2018
“Policy Optimization by Genetic Distillation”, Gangwani & Peng 2017
“Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks”, Sabatelli 2017 (page 3)
Learning to Play Chess with Minimal Lookahead and Deep Value Neural Networks
“DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, Menda et al 2017
DropoutDAgger: A Bayesian Approach to Safe Imitation Learning
“One-Shot Visual Imitation Learning via Meta-Learning”, Finn et al 2017
“Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration”, Rahmatizadeh et al 2017
“Learning Human Behaviors from Motion Capture by Adversarial Imitation”, Merel et al 2017
Learning human behaviors from motion capture by adversarial imitation
“Grammatical Error Correction With Neural Reinforcement Learning”, Sakaguchi et al 2017
Grammatical Error Correction with Neural Reinforcement Learning
“Path Integral Networks: End-To-End Differentiable Optimal Control”, Okada et al 2017
Path Integral Networks: End-to-End Differentiable Optimal Control
“Gated-Attention Architectures for Task-Oriented Language Grounding”, Chaplot et al 2017
Gated-Attention Architectures for Task-Oriented Language Grounding
“Visual Semantic Planning Using Deep Successor Representations”, Zhu et al 2017
Visual Semantic Planning using Deep Successor Representations
“A Deep Reinforced Model for Abstractive Summarization”, Paulus et al 2017
“One-Shot Imitation Learning”, Duan et al 2017
“Model-Based Adversarial Imitation Learning”, Baram et al 2016
“A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models”, Finn et al 2016
“SeqGAN: Sequence Generative Adversarial Nets With Policy Gradient”, Yu et al 2016
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
“Generative Adversarial Imitation Learning”, Ho & Ermon 2016
“Mastering the Game of Go With Deep Neural Networks and Tree Search”, Silver et al 2016
Mastering the game of Go with deep neural networks and tree search
“An Invitation to Imitation”, Bagnell 2015
“DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Ross et al 2010
DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
“The Hidden Structure of Overimitation”, Lyons et al 2007
“Google DeepMind’s Grandmaster-Level Chess Without Search”
“Language Models Model Us”
“Sony’s Racing Car AI Just Destroyed Its Human Competitors—By Being Nice (and Fast)”
Sony’s racing car AI just destroyed its human competitors—by being nice (and fast):
Miscellaneous
-
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
-
https://bair.berkeley.edu/blog/2022/04/25/rl-or-bc/
:View External Link:
-
https://generallyintelligent.substack.com/p/fine-tuning-mistral-7b-on-magic-the
-
https://mobile-aloha.github.io/
:View External Link:
-
https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AI&restrict_sr=on&sort=new
:
Bibliography
-
https://arxiv.org/abs/2402.04494#deepmind
: “Grandmaster-Level Chess Without Search”, -
https://www.nature.com/articles/s41467-023-42875-2#deepmind
: “Learning Few-Shot Imitation As Cultural Transmission”, -
https://arxiv.org/abs/2310.16410#deepmind
: “Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero”, -
https://arxiv.org/abs/2308.04445
: “Getting from Generative AI to Trustworthy AI: What LLMs Might Learn from Cyc”, -
https://arxiv.org/abs/2306.05426
: “SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling With Backtracking”, -
https://arxiv.org/abs/2306.00323
: “Thought Cloning: Learning to Think While Acting by Imitating Human Thinking”, -
https://arxiv.org/abs/2305.20050#openai
: “Let’s Verify Step by Step”, -
https://arxiv.org/abs/2305.15717
: “The False Promise of Imitating Proprietary LLMs”, -
https://arxiv.org/abs/2305.09836
: “Revisiting the Minimalist Approach to Offline Reinforcement Learning”, -
https://arxiv.org/abs/2304.13705
: “ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, -
2022-bakhtin.pdf
: “CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, -
https://arxiv.org/abs/2210.10760#openai
: “Scaling Laws for Reward Model Overoptimization”, -
https://arxiv.org/abs/2210.01241
: “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, -
https://arxiv.org/abs/2206.11795#openai
: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, -
https://arxiv.org/abs/2206.05314#deepmind
: “Large-Scale Retrieval for Reinforcement Learning”, -
https://openreview.net/forum?id=0ZbPmmB61g#google
: “Boosting Search Engines With Interactive Agents”, -
https://arxiv.org/abs/2204.03514#facebook
: “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, -
https://arxiv.org/abs/2112.09332#openai
: “WebGPT: Browser-Assisted Question-Answering With Human Feedback”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”, -
https://arxiv.org/abs/2105.12196#deepmind
: “From Motor Control to Team Play in Simulated Humanoid Football”, -
https://arxiv.org/abs/2101.11071
: “The MineRL 2020 Competition on Sample Efficient Reinforcement Learning Using Human Priors”, -
https://arxiv.org/abs/2012.05672#deepmind
: “Imitating Interactive Intelligence”, -
https://arxiv.org/abs/2011.13729#tencent
: “TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game”, -
https://arxiv.org/abs/1811.02549
: “Language GANs Falling Short”, -
2018-gudmundsson.pdf
: “Human-Like Playtesting With Deep Learning”, -
2017-sabatelli.pdf#page=3
: “Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks”,