- See Also
- Gwern
-
Links
- “How Do You Change a Chatbot’s Mind? When I Set out to Improve My Tainted Reputation With Chatbots, I Discovered a New World of A.I. Manipulation”, Roose 2024
- “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, Price et al 2024
- “What Are the Odds? Language Models Are Capable of Probabilistic Reasoning”, Paruchuri et al 2024
- “Creativity Has Left the Chat: The Price of Debiasing Language Models”, Mohammadi 2024
- “To Believe or Not to Believe Your LLM”, Yadkori et al 2024
- “Can Language Models Explain Their Own Classification Behavior?”, Sherburn et al 2024
- “Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience”, Han et al 2024
- “Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation”, Gu et al 2024
- “Few-Shot Recalibration of Language Models”, Li et al 2024
- “Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States”, Duan et al 2024
- “The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, Renze & Guven 2024
- “I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench”, Li et al 2024
- “Learning to Trust Your Feelings: Leveraging Self-Awareness in LLMs for Hallucination Mitigation”, Liang et al 2024
- “Can AI Assistants Know What They Don’t Know?”, Cheng et al 2024
- “Challenges With Unsupervised LLM Knowledge Discovery”, Farquhar et al 2023
- “Calibrated Language Models Must Hallucinate”, Kalai & Vempala 2023
- “R-Tuning: Teaching Large Language Models to Refuse Unknown Questions”, Zhang et al 2023
- “Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, Shrivastava et al 2023
- “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Schoenegger & Park 2023
- “The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets”, Marks & Tegmark 2023
- “Representation Engineering: A Top-Down Approach to AI Transparency”, Zou et al 2023
- “How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions”, Pacchiardi et al 2023
- “Large Language Models Are Not Robust Multiple Choice Selectors”, Zheng et al 2023
- “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model”, Li et al 2023
- “Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned With Human Feedback”, Tian et al 2023
- “How Language Model Hallucinations Can Snowball”, Zhang et al 2023
- “Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023
- “GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)
- “Toolformer: Language Models Can Teach Themselves to Use Tools”, Schick et al 2023
- “Predicting Consumer Contracts [With GPT-3]”, Kolt 2023
- “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, Nay 2023
- “Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022
- “Language Models (Mostly) Know What They Know”, Kadavath et al 2022
- “Forecasting Future World Events With Neural Networks”, Zou et al 2022
- “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, Srivastava et al 2022
- “Teaching Models to Express Their Uncertainty in Words”, Lin et al 2022
- “Co-Training Improves Prompt-Based Learning for Large Language Models”, Lang et al 2022
- “AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021
- “Calibrate Before Use: Improving Few-Shot Performance of Language Models”, Zhao et al 2021
- “Reducing Conversational Agents’ Overconfidence through Linguistic Calibration”, Mielke et al 2020
- “Situational Awareness and Out-Of-Context Reasoning § Biased Coin Task”, Evans 2024
- “Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity”
- “Can AI Outpredict Humans? Results From Metaculus’s Q3 AI Forecasting Benchmark [No]”
- “Language Models Model Us”
- M74108556
- Sort By Magic
- Miscellaneous
- Bibliography
See Also
Gwern
“GPT-3 Nonfiction”, Gwern 2020
Links
“How Do You Change a Chatbot’s Mind? When I Set out to Improve My Tainted Reputation With Chatbots, I Discovered a New World of A.I. Manipulation”, Roose 2024
“Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, Price et al 2024
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
“What Are the Odds? Language Models Are Capable of Probabilistic Reasoning”, Paruchuri et al 2024
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
“Creativity Has Left the Chat: The Price of Debiasing Language Models”, Mohammadi 2024
Creativity Has Left the Chat: The Price of Debiasing Language Models
“To Believe or Not to Believe Your LLM”, Yadkori et al 2024
“Can Language Models Explain Their Own Classification Behavior?”, Sherburn et al 2024
Can Language Models Explain Their Own Classification Behavior?
“Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience”, Han et al 2024
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience
“Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation”, Gu et al 2024
“Few-Shot Recalibration of Language Models”, Li et al 2024
“Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States”, Duan et al 2024
Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States
“The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, Renze & Guven 2024
The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4
“I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench”, Li et al 2024
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
“Learning to Trust Your Feelings: Leveraging Self-Awareness in LLMs for Hallucination Mitigation”, Liang et al 2024
Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation
“Can AI Assistants Know What They Don’t Know?”, Cheng et al 2024
“Challenges With Unsupervised LLM Knowledge Discovery”, Farquhar et al 2023
“Calibrated Language Models Must Hallucinate”, Kalai & Vempala 2023
“R-Tuning: Teaching Large Language Models to Refuse Unknown Questions”, Zhang et al 2023
R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
“Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, Shrivastava et al 2023
Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation
“Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Schoenegger & Park 2023
Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament
“The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets”, Marks & Tegmark 2023
“Representation Engineering: A Top-Down Approach to AI Transparency”, Zou et al 2023
Representation Engineering: A Top-Down Approach to AI Transparency
“How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions”, Pacchiardi et al 2023
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
“Large Language Models Are Not Robust Multiple Choice Selectors”, Zheng et al 2023
Large Language Models Are Not Robust Multiple Choice Selectors
“Inference-Time Intervention: Eliciting Truthful Answers from a Language Model”, Li et al 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
“Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned With Human Feedback”, Tian et al 2023
“How Language Model Hallucinations Can Snowball”, Zhang et al 2023
“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023
Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding
“GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)
“Toolformer: Language Models Can Teach Themselves to Use Tools”, Schick et al 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
“Predicting Consumer Contracts [With GPT-3]”, Kolt 2023
“Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, Nay 2023
“Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022
“Language Models (Mostly) Know What They Know”, Kadavath et al 2022
“Forecasting Future World Events With Neural Networks”, Zou et al 2022
“Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, Srivastava et al 2022
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
“Teaching Models to Express Their Uncertainty in Words”, Lin et al 2022
“Co-Training Improves Prompt-Based Learning for Large Language Models”, Lang et al 2022
Co-training Improves Prompt-based Learning for Large Language Models
“AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021
“Calibrate Before Use: Improving Few-Shot Performance of Language Models”, Zhao et al 2021
Calibrate Before Use: Improving Few-Shot Performance of Language Models
“Reducing Conversational Agents’ Overconfidence through Linguistic Calibration”, Mielke et al 2020
Reducing conversational agents’ overconfidence through linguistic calibration
“Situational Awareness and Out-Of-Context Reasoning § Biased Coin Task”, Evans 2024
Situational Awareness and Out-Of-Context Reasoning § Biased Coin Task
“Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity”
Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity:
“Can AI Outpredict Humans? Results From Metaculus’s Q3 AI Forecasting Benchmark [No]”
Can AI Outpredict Humans? Results From Metaculus’s Q3 AI Forecasting Benchmark [no]:
“Language Models Model Us”
M74108556
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
chatbot-manipulation
hallucination
probabilistic-reasoning
confidence-calibration
Miscellaneous
Bibliography
-
https://www.nytimes.com/2024/08/30/technology/ai-chatbot-chatgpt-manipulation.html
: “How Do You Change a Chatbot’s Mind? When I Set out to Improve My Tainted Reputation With Chatbots, I Discovered a New World of A.I. Manipulation”, -
https://arxiv.org/abs/2407.04108
: “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, -
https://arxiv.org/abs/2310.13014
: “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, -
https://arxiv.org/abs/2305.13534
: “How Language Model Hallucinations Can Snowball”, -
https://arxiv.org/pdf/2303.08774#page=12&org=openai
: “GPT-4 Technical Report § Limitations: Calibration”, -
2022-kolt.pdf
: “Predicting Consumer Contracts [With GPT-3]”, -
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945
: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, -
https://arxiv.org/abs/2207.08143
: “Can Large Language Models Reason about Medical Questions?”, -
https://arxiv.org/abs/2207.05221#anthropic
: “Language Models (Mostly) Know What They Know”, -
https://arxiv.org/abs/2206.15474
: “Forecasting Future World Events With Neural Networks”, -
https://arxiv.org/abs/2206.04615
: “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”,