‘truesight (stylometrics)’ tag

Gwern

‘truesight (stylometrics)’ tag

See Also
Links
Miscellaneous
Bibliography

Links

“Thoughts While Watching Myself Be Automated”, Dynomight 2024

Thoughts while watching myself be automated

“Investigating the Ability of LLMs to Recognize Their Own Writing”, Ackerman & Panickssery 2024

Investigating the Ability of LLMs to Recognize Their Own Writing

“Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Laine et al 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

“Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, Price et al 2024

Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

“Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data”, Treutlein et al 2024

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

“Designing a Dashboard for Transparency and Control of Conversational AI”, Chen et al 2024

Designing a Dashboard for Transparency and Control of Conversational AI

“LLM Evaluators Recognize and Favor Their Own Generations”, Panickssery et al 2024

LLM Evaluators Recognize and Favor Their Own Generations

“Surfing the OCEAN: The Machine Learning Psycholexical Approach 2.0 to Detect Personality Traits in Texts”, Giannini et al 2024

Surfing the OCEAN: The machine learning psycholexical approach 2.0 to detect personality traits in texts

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

“Taken out of Context: On Measuring Situational Awareness in LLMs”, Berglund et al 2023

Taken out of context: On measuring situational awareness in LLMs

“PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits”, Jiang et al 2023

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

“Truesight”

Truesight⁠:

View HTML:

/doc/www/cyborgism.wiki/0c3d40875321882d1663ab5b1b018f3fcd9fac8f.html

“Situational Awareness and Out-Of-Context Reasoning § GPT-4-Base Has Non-Zero Longform Performance”, Evans 2025

Situational Awareness and Out-Of-Context Reasoning § GPT-4-base has Non-Zero Longform Performance

“Situational Awareness in Large Language Models”

Situational awareness in Large Language Models⁠:

View HTML:

/doc/www/www.greaterwrong.com/4eff43f02f9323a2b2a36c62661361cfab25b9e8.html

“Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs”

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs⁠:

View HTML:

/doc/www/www.greaterwrong.com/ce9c8f71ad54707afd165ee5607750648a998a5a.html

“Language Models Model Us”

Language Models Model Us

“The Case for More Ambitious Language Model Evals”

The case for more ambitious language model evals⁠:

View HTML:

/doc/www/www.greaterwrong.com/a1db1647e9173aaacd1968b6f0fdd0b4eecc578a.html

“The Case for More Ambitious Language Model Evals”

The case for more ambitious language model evals⁠:

View HTML:

/doc/www/www.greaterwrong.com/a1db1647e9173aaacd1968b6f0fdd0b4eecc578a.html

“The Case for More Ambitious Language Model Evals”

The case for more ambitious language model evals⁠:

View HTML:

/doc/www/www.greaterwrong.com/1241c140cfbe7e7f2478a11b1d7413c09055724c.html

“Early Situational Awareness and Its Implications, a Story”

Early situational awareness and its implications, a story⁠:

View HTML:

/doc/www/www.greaterwrong.com/20ea9a879c0915ecfa2f2f87dba168dc160967cb.html

Miscellaneous

Bibliography

https://dynomight.net/automated/: “Thoughts While Watching Myself Be Automated”, Dynomight

link-bibliography
https://www.lesswrong.com/posts/ADrTuuus6JsQr5CSi/investigating-the-ability-of-llms-to-recognize-their-own: “Investigating the Ability of LLMs to Recognize Their Own Writing”, Christopher Ackerman, Nina Panickssery

link-bibliography
https://arxiv.org/abs/2407.04694: “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

link-bibliography
https://arxiv.org/abs/2407.04108: “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, Sara Price, Arjun Panickssery, Samuel R. Bowman, Asa Cooper Stickland

link-bibliography
https://arxiv.org/abs/2406.07882: “Designing a Dashboard for Transparency and Control of Conversational AI”, Yida Chen, Aoyu Wu, Trevor DePodesta, Catherine Yeh, Kenneth Li, Nicholas Castillo Marin, Oam Patel, Jan Riecke, Shivam Raval, Olivia Seow, Martin Wattenberg, Fernanda Viégas

link-bibliography
https://arxiv.org/abs/2404.13076: “LLM Evaluators Recognize and Favor Their Own Generations”, Arjun Panickssery, Samuel R. Bowman, Shi Feng

link-bibliography
https://arxiv.org/abs/2309.00667: “Taken out of Context: On Measuring Situational Awareness in LLMs”, Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans

link-bibliography