- See Also
- Gwern
-
Links
- “Epistemic Calibration and Searching the Space of Truth”, Lee 2024
- “Predicting the Direction of Phenotypic Difference”, Gokhman et al 2024
- “Diffusion Model Alignment Using Direct Preference Optimization”, Wallace et al 2023
- “A General Theoretical Paradigm to Understand Learning from Human Preferences”, Azar et al 2023
- “On the Optimal Bounds for Noisy Computing”, Zhu et al 2023
- “Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model”, Rafailov et al 2023
- “Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems”, Feng et al 2023
- “Reputation Inflation”, Filippas et al 2022
- “Bayesian Inference of the Climbing Grade Scale”, Drummond & Popinga 2021
- “PiRank: Learning To Rank via Differentiable Sorting”, Swezey et al 2020
- “Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment”, Talebi et al 2020
- “Self-Play Learning Without a Reward Metric”, Schmidt et al 2019
- “Group Testing: An Information Theory Perspective”, Aldridge et al 2019
- “Top-K Off-Policy Correction for a REINFORCE Recommender System”, Chen et al 2018
- “Comparison Based Learning from Weak Oracles”, Kazemi et al 2018
- “OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning”, Henderson et al 2017
- “Analogical-Based Bayesian Optimization”, Le et al 2017
- “Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking”, Chen et al 2017
- “The Competitiveness of Games in Professional Sports Leagues”, Wills 2017
- “Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
- “PBO: Preferential Bayesian Optimization”, Gonzalez et al 2017
- “D-TS: Double Thompson Sampling for Dueling Bandits”, Wu & Liu 2016
- “Just Sort It! A Simple and Effective Approach to Active Preference Learning”, Maystre & Grossglauser 2015
- “On the Complexity of Best Arm Identification in Multi-Armed Bandit Models”, Kaufmann et al 2014
- “Bayesian Active Learning for Classification and Preference Learning”, Houlsby et al 2011
- “Case Studies in Bayesian Computation Using INLA”, Martino & Rue 2010
- “Sorting from Noisy Information”, Braverman & Mossel 2009
- “Can People Distinguish Pâté From Dog Food? [Preprint]”, Bohannon et al 2009
- “Aggregating Inconsistent Information: Ranking and Clustering”, Ailon et al 2008
- “Pure Exploration for Multi-Armed Bandit Problems”, Bubeck et al 2008
- “Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings”, Goldstein et al 2008
- “Noisy Sorting Without Resampling”, Braverman & Mossel 2007
- “Noisy Binary Search and Its Applications”, Karp & Kleinberg 2007
- “Paired Comparison Models for Ranking National Soccer Teams”, Hallinan 2005
- “Bayesian Adaptive Exploration”, Loredo & Chernoff 2003
- “How Dangerous Are Drinking Drivers?”, Levitt & Porter 2001
- “Sympercents: Symmetric Percentage Differences on the 100 Loge Scale Simplify the Presentation of Log Transformed Data”, Cole 2000
- “Born Again Group Testing: Multiaccess Communications”, Wolf 1985
- “The Analysis of Sequential Experiments With Feedback to Subjects”, Diaconis & Graham 1981
- The Rating of Chessplayers, Past and Present (Second Edition), Elo 1978
- “Optimal Selection Based On Relative Rank (the ‘Secretary Problem’)”, Chow 1964
- “Inconsistencies in a Schedule of Paired Comparisons”
- “Metacritic Has A (File-Drawer) Problem”
- “Valuing Research Works by Eliciting Comparisons from EA Researchers”
- “Futurama Theorem”
- Sort By Magic
- Wikipedia
- Miscellaneous
See Also
Gwern
“Open Questions”, Gwern 2018
“GPT-2 Preference Learning for Music Generation”, Gwern 2019
“Resorting Media Ratings”, Gwern 2015
Links
“Epistemic Calibration and Searching the Space of Truth”, Lee 2024
“Predicting the Direction of Phenotypic Difference”, Gokhman et al 2024
“Diffusion Model Alignment Using Direct Preference Optimization”, Wallace et al 2023
Diffusion Model Alignment Using Direct Preference Optimization
“A General Theoretical Paradigm to Understand Learning from Human Preferences”, Azar et al 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences
“On the Optimal Bounds for Noisy Computing”, Zhu et al 2023
“Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model”, Rafailov et al 2023
Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model
“Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems”, Feng et al 2023
“Reputation Inflation”, Filippas et al 2022
“Bayesian Inference of the Climbing Grade Scale”, Drummond & Popinga 2021
“PiRank: Learning To Rank via Differentiable Sorting”, Swezey et al 2020
“Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment”, Talebi et al 2020
Rank-Smoothed Pairwise Learning In Perceptual Quality Assessment
“Self-Play Learning Without a Reward Metric”, Schmidt et al 2019
“Group Testing: An Information Theory Perspective”, Aldridge et al 2019
“Top-K Off-Policy Correction for a REINFORCE Recommender System”, Chen et al 2018
Top-K Off-Policy Correction for a REINFORCE Recommender System
“Comparison Based Learning from Weak Oracles”, Kazemi et al 2018
“OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning”, Henderson et al 2017
“Analogical-Based Bayesian Optimization”, Le et al 2017
“Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking”, Chen et al 2017
Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking
“The Competitiveness of Games in Professional Sports Leagues”, Wills 2017
“Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
“PBO: Preferential Bayesian Optimization”, Gonzalez et al 2017
“D-TS: Double Thompson Sampling for Dueling Bandits”, Wu & Liu 2016
“Just Sort It! A Simple and Effective Approach to Active Preference Learning”, Maystre & Grossglauser 2015
Just Sort It! A Simple and Effective Approach to Active Preference Learning
“On the Complexity of Best Arm Identification in Multi-Armed Bandit Models”, Kaufmann et al 2014
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models
“Bayesian Active Learning for Classification and Preference Learning”, Houlsby et al 2011
Bayesian Active Learning for Classification and Preference Learning
“Case Studies in Bayesian Computation Using INLA”, Martino & Rue 2010
“Sorting from Noisy Information”, Braverman & Mossel 2009
“Can People Distinguish Pâté From Dog Food? [Preprint]”, Bohannon et al 2009
“Aggregating Inconsistent Information: Ranking and Clustering”, Ailon et al 2008
Aggregating inconsistent information: Ranking and clustering
“Pure Exploration for Multi-Armed Bandit Problems”, Bubeck et al 2008
“Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings”, Goldstein et al 2008
Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings
“Noisy Sorting Without Resampling”, Braverman & Mossel 2007
“Noisy Binary Search and Its Applications”, Karp & Kleinberg 2007
“Paired Comparison Models for Ranking National Soccer Teams”, Hallinan 2005
“Bayesian Adaptive Exploration”, Loredo & Chernoff 2003
“How Dangerous Are Drinking Drivers?”, Levitt & Porter 2001
“Sympercents: Symmetric Percentage Differences on the 100 Loge Scale Simplify the Presentation of Log Transformed Data”, Cole 2000
“Born Again Group Testing: Multiaccess Communications”, Wolf 1985
“The Analysis of Sequential Experiments With Feedback to Subjects”, Diaconis & Graham 1981
The Analysis of Sequential Experiments with Feedback to Subjects
The Rating of Chessplayers, Past and Present (Second Edition), Elo 1978
The Rating of Chessplayers, Past and Present (Second Edition)
“Optimal Selection Based On Relative Rank (the ‘Secretary Problem’)”, Chow 1964
Optimal Selection Based On Relative Rank (the ‘Secretary Problem’):
“Inconsistencies in a Schedule of Paired Comparisons”
“Metacritic Has A (File-Drawer) Problem”
Metacritic Has A (File-Drawer) Problem:
View External Link:
“Valuing Research Works by Eliciting Comparisons from EA Researchers”
Valuing research works by eliciting comparisons from EA researchers:
“Futurama Theorem”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.