‘inner monologue (AI)’ tag

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2412.01981: “Free Process Rewards without Process Labels”, Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, Hao Peng

link-bibliography
https://arxiv.org/abs/2410.21333: “Mind Your Step (by Step): Chain-Of-Thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse”, Ryan Liu, Jiayi Geng, Addison J. Wu, Ilia Sucholutsky, Tania Lombrozo, Thomas L. Griffiths

link-bibliography
https://arxiv.org/abs/2406.13121#google: “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?”, Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

link-bibliography
https://arxiv.org/abs/2405.15143: “Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models”, Cong Lu, Shengran Hu, Jeff Clune

link-bibliography
https://arxiv.org/abs/2405.14838: “From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step”, Yuntian Deng, Yejin Choi, Stuart Shieber

link-bibliography
https://arxiv.org/abs/2404.15574: “Retrieval Head Mechanistically Explains Long-Context Factuality”, Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu

link-bibliography
https://arxiv.org/abs/2404.15758: “Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models”, Jacob Pfau, William Merrill, Samuel R. Bowman

link-bibliography
https://link.springer.com/article/10.1007/s10506-024-09396-9: “Re-Evaluating GPT-4’s Bar Exam Performance”, Eric Martínez

link-bibliography
https://arxiv.org/abs/2403.18802#deepmind: “Long-Form Factuality in Large Language Models”, Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

link-bibliography
https://arxiv.org/abs/2403.09629: “Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking”, Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman

link-bibliography
https://arxiv.org/abs/2402.14903: “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Aaditya K. Singh, D. J. Strouse

link-bibliography
https://arxiv.org/abs/2402.09963: “Why Are Sensitive Functions Hard for Transformers?”, Michael Hahn, Mark Rofin

link-bibliography
https://arxiv.org/abs/2402.05120#tencent: “More Agents Is All You Need”, Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye

link-bibliography
https://arxiv.org/abs/2312.08935: “Math-Shepherd: Verify and Reinforce LLMs Step-By-Step without Human Annotations”, Peiyi Wang, Lei Li, Zhihong Shao, R. X. Xu, Damai Dai, Yifei Li, Deli Chen, Y. Wu, Zhifang Sui

link-bibliography
https://arxiv.org/abs/2312.06585#deepmind: “Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReST^EM)”, Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

link-bibliography
https://arxiv.org/abs/2311.16452#microsoft: “Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine”, Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz

link-bibliography
https://arxiv.org/abs/2312.02179: “Training Chain-Of-Thought via Latent-Variable Inference”, Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous

link-bibliography
https://arxiv.org/abs/2310.08678: “Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

link-bibliography
https://arxiv.org/abs/2310.04406: “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

link-bibliography
https://arxiv.org/abs/2310.02226: “Think Before You Speak: Training Language Models With Pause Tokens”, Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

link-bibliography
https://arxiv.org/abs/2309.09117#facebook: “Contrastive Decoding Improves Reasoning in Large Language Models”, Sean O’Brien, Mike Lewis

link-bibliography
https://arxiv.org/abs/2309.06275: “Re-Reading Improves Reasoning in Large Language Models”, Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong Long, Jian-guang Lou, Shuai Ma

link-bibliography
https://arxiv.org/abs/2309.04269: “From Sparse to Dense: GPT-4 Summarization With Chain of Density (CoD) Prompting”, Griffin Adams, Alexander Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad

link-bibliography
https://arxiv.org/abs/2308.07921: “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification”, Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li

link-bibliography
https://arxiv.org/abs/2307.05300#microsoft: “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration”, Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji

link-bibliography
https://arxiv.org/abs/2307.03381: “Teaching Arithmetic to Small Transformers”, Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos

link-bibliography
https://arxiv.org/abs/2306.14308#google: “Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning”, Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, Jilin Chen

link-bibliography
https://arxiv.org/abs/2306.00323: “Thought Cloning: Learning to Think While Acting by Imitating Human Thinking”, Shengran Hu, Jeff Clune

link-bibliography
https://arxiv.org/abs/2305.20050#openai: “Let’s Verify Step by Step”, Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

link-bibliography
https://arxiv.org/abs/2305.13534: “How Language Model Hallucinations Can Snowball”, Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah Smith

link-bibliography
https://arxiv.org/abs/2305.10601#deepmind: “Tree of Thoughts (ToT): Deliberate Problem Solving With Large Language Models”, Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan

link-bibliography
https://arxiv.org/abs/2305.04388: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman

link-bibliography
https://arxiv.org/abs/2305.02301#google: “Distilling Step-By-Step! Outperforming Larger Language Models With Less Training Data and Smaller Model Sizes”, Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister

link-bibliography
https://arxiv.org/abs/2304.11490: “Boosting Theory-Of-Mind Performance in Large Language Models via Prompting”, Shima Rahimi Moghaddam, Christopher J. Honey

link-bibliography
https://arxiv.org/abs/2304.02015#alibaba: “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang

link-bibliography
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335905: “ChatGPT Goes to Law School”, Jonathan H. Choi, Kristin E. Hickman, Amy Monahan, Daniel Schwarcz

link-bibliography
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, John Nay

link-bibliography
https://arxiv.org/abs/2301.01751#elicit: “Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes”, Justin Reppert, Ben Rachbach, Charlie George, Luke Stebbing, Jungwon Byun, Maggie Appleton, Andreas Stuhlmüller

link-bibliography
https://arxiv.org/abs/2210.11399#google: “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

link-bibliography
https://arxiv.org/abs/2210.11610#google: “Large Language Models Can Self-Improve”, Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han

link-bibliography
https://arxiv.org/abs/2210.09261#google: “Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them”, Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

link-bibliography
https://arxiv.org/abs/2210.03350#allen: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis

link-bibliography
https://arxiv.org/abs/2210.03057#google: “Language Models Are Multilingual Chain-Of-Thought Reasoners”, Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

link-bibliography
https://arxiv.org/abs/2209.00840: “FOLIO: Natural Language Reasoning With First-Order Logic”, Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Luke Benson, Lucy Sun, Ekaterina Zubova, Yujie Qiao, Matthew Burtell, David Peng, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Shafiq Joty, Alexander R. Fabbri, Wojciech Kryscinski, Xi Victoria Lin, Caiming Xiong, Dragomir Radev

link-bibliography
https://arxiv.org/abs/2207.08143: “Can Large Language Models Reason about Medical Questions?”, Valentin Liévin, Christoffer Egeberg Hother, Ole Winther

link-bibliography
https://arxiv.org/abs/2207.05608#google: “Inner Monologue: Embodied Reasoning through Planning With Language Models”, Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

link-bibliography
https://arxiv.org/abs/2207.05221#anthropic: “Language Models (Mostly) Know What They Know”, Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy L. Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Samuel R. Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan

link-bibliography
https://arxiv.org/abs/2205.10625#google: “Least-To-Most Prompting Enables Complex Reasoning in Large Language Models”, Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc Le, Ed Chi

link-bibliography
https://arxiv.org/abs/2205.09073#google: “Dialog Inpainting: Turning Documents into Dialogues”, Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu

link-bibliography
https://arxiv.org/abs/2205.05131#google: “Unifying Language Learning Paradigms”, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler

link-bibliography
https://arxiv.org/abs/2204.00598#google: “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Andy Zeng, Adrian Wong, Stefan Welker, Krzysztof Choromanski, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence

link-bibliography
https://arxiv.org/abs/2203.11171#google: “Self-Consistency Improves Chain-Of-Thought Reasoning in Language Models”, Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou

link-bibliography
https://arxiv.org/abs/2201.11903#google: “Chain-Of-Thought Prompting Elicits Reasoning in Large Language Models”, Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, Denny Zhou

link-bibliography
https://arxiv.org/abs/2201.11473#microsoft: “Reasoning Like Program Executors”, Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen

link-bibliography
https://arxiv.org/abs/2112.15594: “A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”, Iddo Drori, Sunny Tran, Roman Wang, Newman Cheng, Kevin Liu, Leonard Tang, Elizabeth Ke, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang

link-bibliography
https://openai.com/research/webgpt: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman

link-bibliography
https://arxiv.org/abs/2110.14168#openai: “Training Verifiers to Solve Math Word Problems”, Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

link-bibliography
https://sites.google.com/berkeley.edu/decision-transformer: “Decision Transformer: Reinforcement Learning via Sequence Modeling”, Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

link-bibliography
https://gptprompts.wikidot.com/linguistics:word-in-context#toc3: “Word in Context: Agent and Agent Clarification (69% Dev)”, Matt Brockman

link-bibliography
https://news.ycombinator.com/item?id=23990902: “I Found That Getting GPT-3 to Add Its Own "Internal Monologue" in Parentheses to Be a Helpful Strategy…”, blixt

link-bibliography
https://x.com/kleptid/status/1284069270603866113: “Seems to Work”, KaryoKleptid

link-bibliography
https://x.com/kleptid/status/1284098635689611264: “Teaching GPT-3 to Do a Brute Force 'For Loop' Checking Answers Also Seems to Work”, KaryoKleptid

link-bibliography
2018-bisra.pdf: “Inducing Self-Explanation: a Meta-Analysis”, Kiran Bisra, Qing Liu, John C. Nesbit, Farimah Salimi, Philip H. Winne

link-bibliography