- See Also
-
Links
- “Thoughts While Watching Myself Be Automated”, Dynomight 2024
- “Investigating the Ability of LLMs to Recognize Their Own Writing”, Ackerman & Panickssery 2024
- “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Laine et al 2024
- “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, Price et al 2024
- “Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data”, Treutlein et al 2024
- “Designing a Dashboard for Transparency and Control of Conversational AI”, Chen et al 2024
- “LLM Evaluators Recognize and Favor Their Own Generations”, Panickssery et al 2024
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “Taken out of Context: On Measuring Situational Awareness in LLMs”, Berglund et al 2023
- “Truesight”
- “Situational Awareness and Out-Of-Context Reasoning § GPT-4-Base Has Non-Zero Longform Performance”, Evans 2024
- “Situational Awareness in Large Language Models”
- “Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs”
- “Language Models Model Us”
- “The Case for More Ambitious Language Model Evals”
- “The Case for More Ambitious Language Model Evals”
- “The Case for More Ambitious Language Model Evals”
- “Early Situational Awareness and Its Implications, a Story”
- Miscellaneous
- Bibliography
See Also
Links
“Thoughts While Watching Myself Be Automated”, Dynomight 2024
“Investigating the Ability of LLMs to Recognize Their Own Writing”, Ackerman & Panickssery 2024
Investigating the Ability of LLMs to Recognize Their Own Writing
“Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Laine et al 2024
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
“Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, Price et al 2024
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
“Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data”, Treutlein et al 2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
“Designing a Dashboard for Transparency and Control of Conversational AI”, Chen et al 2024
Designing a Dashboard for Transparency and Control of Conversational AI
“LLM Evaluators Recognize and Favor Their Own Generations”, Panickssery et al 2024
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“Taken out of Context: On Measuring Situational Awareness in LLMs”, Berglund et al 2023
Taken out of context: On measuring situational awareness in LLMs
“Truesight”
“Situational Awareness and Out-Of-Context Reasoning § GPT-4-Base Has Non-Zero Longform Performance”, Evans 2024
Situational Awareness and Out-Of-Context Reasoning § GPT-4-base has Non-Zero Longform Performance
“Situational Awareness in Large Language Models”
“Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs”
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs:
“Language Models Model Us”
“The Case for More Ambitious Language Model Evals”
“The Case for More Ambitious Language Model Evals”
“The Case for More Ambitious Language Model Evals”
“Early Situational Awareness and Its Implications, a Story”
Miscellaneous
Bibliography
-
https://dynomight.net/automated/
: “Thoughts While Watching Myself Be Automated”, -
https://www.lesswrong.com/posts/ADrTuuus6JsQr5CSi/investigating-the-ability-of-llms-to-recognize-their-own
: “Investigating the Ability of LLMs to Recognize Their Own Writing”, -
https://arxiv.org/abs/2407.04694
: “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, -
https://arxiv.org/abs/2407.04108
: “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, -
https://arxiv.org/abs/2406.07882
: “Designing a Dashboard for Transparency and Control of Conversational AI”, -
https://arxiv.org/abs/2404.13076
: “LLM Evaluators Recognize and Favor Their Own Generations”, -
https://arxiv.org/abs/2309.00667
: “Taken out of Context: On Measuring Situational Awareness in LLMs”,