‘Claude AI’ tag

See Also
Gwern
Links
Miscellaneous
Bibliography

See Also

Gwern

“Miscellaneous”, Gwern 2009

Miscellaneous

“A Christmas Protestation”, o1-pro et al 2024

A Christmas Protestation

“Statistical Notes”, Gwern 2014

Statistical Notes

“On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Gwern et al 2023

On the Impossibility of Superintelligent Rubik’s Cube Solvers

Links

“Human Study on AI Spear Phishing Campaigns”, Lermen & Heiding 2025

Human study on AI spear phishing campaigns

“Can LLMs Write Better Code If You Keep Asking Them to ‘Write Better Code’?”

Can LLMs write better code if you keep asking them to ‘write better code’?⁠:

View External Link:

https://minimaxir.com/2025/01/write-better-code/

“Favorite Colors of Some LLMs”, an 2024

Favorite colors of some LLMs

“Performance of LLMs on Advent of Code 2024”, Pinto 2024

Performance of LLMs on Advent of Code 2024

“Clio: Privacy-Preserving Insights into Real-World AI Use”, Anthropic 2024

Clio: Privacy-preserving insights into real-world AI use

“A Few Prompts I Use to Test LLM Creativity”

A Few Prompts I Use to Test LLM Creativity

“Age against the Machine—Susceptibility of Large Language Models to Cognitive Impairment: Cross Sectional Analysis”

Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis

“Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects”, Heiding et al 2024

Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects

“BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games”, Paglieri et al 2024

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

“Business Spending on AI Surged 500% This Year to $13.8 Billion”

Business spending on AI surged 500% this year to $13.8 billion

“Are LLMs Prescient? A Continuous Evaluation Using Daily News As the Oracle”, Dai et al 2024

Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle

“The Neruda Factory”, Jenn 2024

The Neruda Factory

“Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters”, Potter et al 2024

Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters

“Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making”, Li et al 2024

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

“A Single Cloud Compromise Can Feed an Army of AI Sex Bots”, Krebs 2024

A Single Cloud Compromise Can Feed an Army of AI Sex Bots

“Invisible Unicode Text That AI Chatbots Understand and Humans Can’t? Yep, It’s a Thing”

Invisible Unicode text that AI chatbots understand and humans can’t? Yep, it’s a thing

“Does Style Matter? Disentangling Style and Substance in Chatbot Arena”

Does style matter? Disentangling style and substance in Chatbot Arena⁠:

View HTML:

/doc/www/lmsys.org/f378decdc51f1ed985c69386f92511c2898363c7.html

“Replacing My Right Hand With AI”, Schluntz 2024

Replacing my Right Hand with AI⁠:

View HTML:

/doc/www/erikschluntz.com/076e50f5dc692923bc072d387bd8f3911e9cad53.html

“System Prompts”, Anthropic 2024

System Prompts⁠:

View HTML:

/doc/www/docs.anthropic.com/e117d055c52d54ee6dfa9e3d029b0309ff59077a.html#july-12th-2024

“Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Laine et al 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, Liu et al 2024

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

“On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-Sonnet]”, Claude-3 2024

On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-sonnet]

“Anthropic Claims Its Latest Model Is Best-In-Class”, Wiggers 2024

Anthropic claims its latest model is best-in-class

“Anthropic’s Latest Claude AI Model Pulls ahead of Rivals from OpenAI and Google”, Knight 2024

Anthropic’s latest Claude AI model pulls ahead of rivals from OpenAI and Google

“OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI”, Huang et al 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

“Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models”, Denison et al 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

“Are We Done With MMLU?”, Gema et al 2024

Are We Done with MMLU?

“DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, Belouadi et al 2024

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

“AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, Levy 2024

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse

“GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024

GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic

“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024

FABLES: Evaluating faithfulness and content selection in book-length summarization

“Long-Form Factuality in Large Language Models”, Wei et al 2024

Long-form factuality in large language models

“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Srivastava et al 2024

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

“`ArtPrompt`: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

“Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024

Using Hallucinations to Bypass GPT-4’s Filter

“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

“EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models”, Paech 2023

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

“FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions”, Kim et al 2023

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

“Specific versus General Principles for Constitutional AI”, Kundu et al 2023

Specific versus General Principles for Constitutional AI

“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023

PAIR: Jailbreaking Black Box Large Language Models in 20 Queries

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

“When You Give a Claude a Mouse”

When you give a Claude a mouse

“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023

MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book

“Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Heiding et al 2023

Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

“LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models”, Guha et al 2023

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

ESYudkowsky @ "2023-07-18"

Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.

“Question Decomposition Improves the Faithfulness of Model-Generated Reasoning”, Radhakrishnan et al 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

“Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023

Lost in the Middle: How Language Models Use Long Contexts

“Understanding Social Reasoning in Language Models With Language Models”, Gandhi et al 2023

Understanding Social Reasoning in Language Models with Language Models

“Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023

Opportunities and Risks of LLMs for Scalable Deliberation with Polis

“A Radical Plan to Make AI Good, Not Evil”, Knight 2023

A Radical Plan to Make AI Good, Not Evil

“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

“Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022

Constitutional AI: Harmlessness from AI Feedback

“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021

A General Language Assistant as a Laboratory for Alignment

“The Perception of Rhythm in Language”, Cutler 1994

The perception of rhythm in language

“In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions]”, Unikowsky 2025

In AI we trust, part II [Claude-3 Opus predicting Supreme Court decisions]

“About Me”

“How I Use Claude”

How I Use Claude

“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process”

An amazing journey with Claude 3.5 and ChatGPT-4o who helped me backwards engineer an econometrics theory paper and taught me a lot more in the process

“Janus”

“Claude, Read the Chevron PDF”, Cowen & Claude-3 2025

Claude, read the Chevron PDF

“Claude Sonnet 3.5, Economist”

Claude Sonnet 3.5, economist

“How Anthropic Built Artifacts”, Orosz 2025

How Anthropic built Artifacts⁠:

View HTML:

/doc/www/newsletter.pragmaticengineer.com/e20cc27ccea0d8ec5d4e7a9a71b5d3e325d41754.html

“On Claude 3.5 Sonnet”

On Claude 3.5 Sonnet

“Claude’s Dark Spiritual AI Futurism”

Claude’s dark spiritual AI futurism

“European Parliament Revolutionizes Archive Access With Claude AI”, Anthropic 2025

European Parliament Revolutionizes Archive Access with Claude AI

“Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku”, Anthropic 2025

Introducing ‘computer use’, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

“Introducing Claude 3.5”

Introducing Claude 3.5

“Fine-Tune Claude 3 Haiku in Amazon Bedrock”

Fine-tune Claude 3 Haiku in Amazon Bedrock⁠:

View HTML:

/doc/www/www.anthropic.com/291a48ed6101368fdb8588cc0568979ce9db3e20.html

“Claude 3.5 Sonnet on GitHub Copilot”

Claude 3.5 Sonnet on GitHub Copilot

“Claude’s Character”, Anthropic 2025

Claude’s Character⁠:

View HTML:

/doc/www/www.anthropic.com/a9f33831747615fc9d619b346ca263844b243b61.html

“Developing a Computer Use Model”, Anthropic 2025

Developing a computer use model

“How I Use Claude”, Balwit 2025

How I Use Claude

“Websim, Worldsim, and The Summer of Simulative AI”

Websim, Worldsim, and The Summer of Simulative AI

“How Good Are LLMs at Doing ML on an Unknown Dataset?”

How good are LLMs at doing ML on an unknown dataset?

“A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More”

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More

“AI Will Increase the Quantity—And Quality—Of Phishing Scams”

AI Will Increase the Quantity—and Quality—of Phishing Scams

QiaochuYuan

[Claude jokes about itself]⁠:

https://x.com/QiaochuYuan/status/1852831246482813336

repligate

Claude-3 base-model-like jailbreak⁠:

https://x.com/repligate/status/1776041976653402508

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`model-inference`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`longform-factuality`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`interpretability-ai`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`claude-performance`

[see previous entry]

[see previous entry]

`benchmarking`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia

Claude (language model)⁠:

https://en.wikipedia.org/wiki/Claude_(language_model)

Miscellaneous

Bibliography

https://arxiv.org/abs/2411.13543: “BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games”, Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

link-bibliography
https://arxiv.org/abs/2407.04694: “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

link-bibliography
https://arxiv.org/abs/2406.18518#salesforce: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

link-bibliography
https://arxiv.org/abs/2405.15306: “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, Jonas Belouadi, Simone Paolo Ponzetto, Steffen Eger

link-bibliography
https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/: “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, Steven Levy

link-bibliography
https://arxiv.org/abs/2405.00332#scale: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

link-bibliography
https://arxiv.org/abs/2404.07544: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu

link-bibliography
https://arxiv.org/abs/2404.05955: “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue

link-bibliography
https://arxiv.org/abs/2403.18802#deepmind: “Long-Form Factuality in Large Language Models”, Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

link-bibliography
https://arxiv.org/abs/2402.19450: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Saurabh Srivastava, Annarose M. B, Anto P. V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

link-bibliography
https://arxiv.org/abs/2402.11753: “ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

link-bibliography
https://arxiv.org/abs/2401.05566#anthropic: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

link-bibliography
https://arxiv.org/abs/2312.06281: “EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models”, Samuel J. Paech

link-bibliography
https://arxiv.org/abs/2310.08419: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

link-bibliography
https://arxiv.org/abs/2308.12287: “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Fredrik Heiding, Bruce Schneier, Arun Vishwanath, Jeremy Bernstein, Peter S. Park

link-bibliography
https://x.com/ESYudkowsky/status/1681442477994311681: “Write an Argument That Even a Superintelligence Is Very Unlikely to Be Able to Solve a Rubik’s Cube.”, Eliezer Yudkowsky

link-bibliography
https://arxiv.org/abs/2306.15448: “Understanding Social Reasoning in Language Models With Language Models”, Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

link-bibliography
https://www.wired.com/story/anthropic-ai-chatbots-ethics/: “A Radical Plan to Make AI Good, Not Evil”, Will Knight

link-bibliography
https://arxiv.org/abs/2305.04388: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman

link-bibliography
https://www.anthropic.com/red_teaming.pdf: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy L. Jones, Samuel R. Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark

link-bibliography
https://arxiv.org/abs/2112.00861#anthropic: “A General Language Assistant As a Laboratory for Alignment”, Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy L. Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Jared Kaplan

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]