- See Also
-
Links
- “AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning”, Mathieu et al 2023
- “Job Hunt As a PhD in RL: How It Actually Happens § Reinforcement Learning Reflections”, Lambert 2022
- “Large-Scale Retrieval for Reinforcement Learning”, Humphreys et al 2022
- “Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
- “Stochastic MuZero: Planning in Stochastic Environments With a Learned Model”, Antonoglou et al 2022
- “Policy Improvement by Planning With Gumbel”, Danihelka et al 2022
- “MuZero With Self-Competition for Rate Control in VP9 Video Compression”, Mandhane et al 2022
- “Procedural Generalization by Planning With Self-Supervised World Models”, Anand et al 2021
- “Mastering Atari Games With Limited Data”, Ye et al 2021
- “Proper Value Equivalence”, Grimm et al 2021
- “Vector Quantized Models for Planning”, Ozair et al 2021
- “Muesli: Combining Improvements in Policy Optimization”, Hessel et al 2021
- “Podracer Architectures for Scalable Reinforcement Learning”, Hessel et al 2021
- “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Schrittwieser et al 2021
- “Learning and Planning in Complex Action Spaces”, Hubert et al 2021
- “Scaling Scaling Laws With Board Games”, Jones 2021
- “Playing Nondeterministic Games through Planning With a Learned Model”, Willkens & Pollack 2021
- “Visualizing MuZero Models”, Vries et al 2021
- “Combining Off and On-Policy Training in Model-Based Reinforcement Learning”, Borges & Oliveira 2021
- “Improving Model-Based Reinforcement Learning With Internal State Representations through Self-Supervision”, Scholz et al 2021
- “On the Role of Planning in Model-Based Deep Reinforcement Learning”, Hamrick et al 2020
- “The Value Equivalence Principle for Model-Based Reinforcement Learning”, Grimm et al 2020
- “Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, Anonymous 2020
- “Monte-Carlo Tree Search As Regularized Policy Optimization”, Grill et al 2020
- “Continuous Control for Searching and Planning With a Learned Model”, Yang et al 2020
- “Agent57: Outperforming the Human Atari Benchmark”, Puigdomènech et al 2020
- “MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model”, Schrittwieser et al 2019
- “Surprising Negative Results for Generative Adversarial Tree Search”, Azizzadenesheli et al 2018
- “TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning”, Farquhar et al 2017
- “Monte Carlo Tree Search in JAX”
- “A Clean Implementation of MuZero and AlphaZero following the AlphaZero General Framework. Train and Pit Both Algorithms against Each Other, and Investigate Reliability of Learned MuZero MDP Models.”
- “MuZero”
- “Learning to Search With MCTSnets”
- “MuZero Intuition”
- “Remaking EfficientZero (as Best I Can)”
- “EfficientZero: How It Works”
- “MuZero”
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Links
“AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning”, Mathieu et al 2023
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
“Job Hunt As a PhD in RL: How It Actually Happens § Reinforcement Learning Reflections”, Lambert 2022
Job Hunt as a PhD in RL: How it Actually Happens § Reinforcement learning reflections
“Large-Scale Retrieval for Reinforcement Learning”, Humphreys et al 2022
“Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
“Stochastic MuZero: Planning in Stochastic Environments With a Learned Model”, Antonoglou et al 2022
Stochastic MuZero: Planning in Stochastic Environments with a Learned Model
“Policy Improvement by Planning With Gumbel”, Danihelka et al 2022
“MuZero With Self-Competition for Rate Control in VP9 Video Compression”, Mandhane et al 2022
MuZero with Self-competition for Rate Control in VP9 Video Compression
“Procedural Generalization by Planning With Self-Supervised World Models”, Anand et al 2021
Procedural Generalization by Planning with Self-Supervised World Models
“Mastering Atari Games With Limited Data”, Ye et al 2021
“Proper Value Equivalence”, Grimm et al 2021
“Vector Quantized Models for Planning”, Ozair et al 2021
“Muesli: Combining Improvements in Policy Optimization”, Hessel et al 2021
“Podracer Architectures for Scalable Reinforcement Learning”, Hessel et al 2021
“MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Schrittwieser et al 2021
MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model
“Learning and Planning in Complex Action Spaces”, Hubert et al 2021
“Scaling Scaling Laws With Board Games”, Jones 2021
“Playing Nondeterministic Games through Planning With a Learned Model”, Willkens & Pollack 2021
Playing Nondeterministic Games through Planning with a Learned Model
“Visualizing MuZero Models”, Vries et al 2021
“Combining Off and On-Policy Training in Model-Based Reinforcement Learning”, Borges & Oliveira 2021
Combining Off and On-Policy Training in Model-Based Reinforcement Learning
“Improving Model-Based Reinforcement Learning With Internal State Representations through Self-Supervision”, Scholz et al 2021
“On the Role of Planning in Model-Based Deep Reinforcement Learning”, Hamrick et al 2020
On the role of planning in model-based deep reinforcement learning
“The Value Equivalence Principle for Model-Based Reinforcement Learning”, Grimm et al 2020
The Value Equivalence Principle for Model-Based Reinforcement Learning
“Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, Anonymous 2020
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
“Monte-Carlo Tree Search As Regularized Policy Optimization”, Grill et al 2020
“Continuous Control for Searching and Planning With a Learned Model”, Yang et al 2020
Continuous Control for Searching and Planning with a Learned Model
“Agent57: Outperforming the Human Atari Benchmark”, Puigdomènech et al 2020
“MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model”, Schrittwieser et al 2019
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
“Surprising Negative Results for Generative Adversarial Tree Search”, Azizzadenesheli et al 2018
Surprising Negative Results for Generative Adversarial Tree Search
“TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning”, Farquhar et al 2017
TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
“Monte Carlo Tree Search in JAX”
“A Clean Implementation of MuZero and AlphaZero following the AlphaZero General Framework. Train and Pit Both Algorithms against Each Other, and Investigate Reliability of Learned MuZero MDP Models.”
“MuZero”
“Learning to Search With MCTSnets”
“MuZero Intuition”
“Remaking EfficientZero (as Best I Can)”
“EfficientZero: How It Works”
View External Link:
https://www.lesswrong.com/posts/mRwJce3npmzbKfxws/efficientzero-how-it-works
“MuZero”
MuZero:
Wikipedia
Miscellaneous
Bibliography
-
https://arxiv.org/abs/2206.05314#deepmind
: “Large-Scale Retrieval for Reinforcement Learning”, -
https://openreview.net/forum?id=0ZbPmmB61g#google
: “Boosting Search Engines With Interactive Agents”, -
https://openreview.net/forum?id=bERaNdoegnO#deepmind
: “Policy Improvement by Planning With Gumbel”, -
https://arxiv.org/abs/2111.01587#deepmind
: “Procedural Generalization by Planning With Self-Supervised World Models”, -
https://arxiv.org/abs/2111.00210
: “Mastering Atari Games With Limited Data”, -
https://arxiv.org/abs/2106.10316#deepmind
: “Proper Value Equivalence”, -
https://arxiv.org/abs/2106.04615#deepmind
: “Vector Quantized Models for Planning”, -
https://arxiv.org/abs/2104.06272#deepmind
: “Podracer Architectures for Scalable Reinforcement Learning”, -
https://arxiv.org/abs/2104.06294#deepmind
: “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, -
https://arxiv.org/abs/2102.12924
: “Visualizing MuZero Models”, -
https://arxiv.org/abs/2011.03506#deepmind
: “The Value Equivalence Principle for Model-Based Reinforcement Learning”, -
https://arxiv.org/abs/2102.04881
: “Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, -
https://arxiv.org/abs/2006.07430
: “Continuous Control for Searching and Planning With a Learned Model”, -
https://deepmind.google/discover/blog/agent57-outperforming-the-human-atari-benchmark/
: “Agent57: Outperforming the Human Atari Benchmark”, -
https://arxiv.org/abs/1710.11417
: “TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning”,