Inner Monologue (by analogy to human inner-monologue) is a family of prompt engineering tricks for large language models which make them solve problems in a ‘step by step’ verbalized way; it is particularly effective on multi-step tasks with ‘one right answer’ such as math word & programming problems.
It can be induced by few-shot examples of several solved problems, finetuning on a corpus (eg. InstructGPT), or with a carefully-chosen prompt inducing a ‘dialogue’ (original discovery) or instructions (eg. “let’s think step by step”). It can be combined with better sampling strategies like best-of ranking or majority voting or a critic, self-distillation on its monologue outputs (possibly repeatedly), additional data like unit tests or retrieval results, & access to oracles like REPLs or humans.
It was discovered in July 2020 by early OA API & AI Dungeon 2 users who found that GPT-3/‘Dragon’ would fail to solve most simple arithmetic problems like multiplication (as found by the GPT-3 paper), but could be coaxed into solving them by setting up a fictional dialogue between the player and a ‘character’ into solving it step by step. This discovery was widely discussed among GPT-3 enthusiasts, and highlighted on my GPT-3 page as a remarkable emergent capability of GPT-3 unlike GPT-2 or earlier models. It has been ‘rediscovered’ repeatedly since (eg. as “scratchpad” or “chain-of-thought”).
Inner-monologue is interesting because it: is a simple prompting technique which dramatically improves benchmark performance (“sampling can show the presence of knowledge but not the absence”), was not predicted but discovered empirically after model release, appears to emerge only in large language models (>80b dense parameters), can have increasing returns to scale, can scale performance even when naive prompting has flat scaling (“hidden scaling”) adds an RNN-esque flavor to feedforward language models, and involves planning (cf. Socratic models/SayCan). As of 2023, training on inner-monologue-generated datasets has become standard, and is responsible for large capability gains; the limits of self-training & exploration are unknown.
A toy-model for how inner-monologue works is that such problems are sequential: when calculating out an arithmetic problem, an error in any step causes all following steps to be wrong. Such a process is a multiplicative pipeline, where failure rates multiply: ie. a P success rate on n steps multiplies to a correctness rate of Pn, which rapidly shrinks in either variable. So inner-monologue makes the task meta-learning easier by being more specific, and reducing to easier sub-tasks, potentially increasing success rate far more than alternatives like scaling a model a few times (eg. a 5-step problem with P = 90% vs P = 99% is 60% vs 95%, which for that improvement via pure scaling of naive prompts, might require >10× scaling). Small models then aren’t smart enough to ‘get it’ from the instructions, and their baseline error rate too high to execute steps reliably enough to see much gain.
I speculate the reason for inner-monologue not being model defaults, when it predicts the answer so much more accurately, may be the lack of an implicit memory mechanism—where a model could adaptively execute computations for predicting the next token. Because models like GPT-3 or PaLM have no recurrent state, they must fake it by reusing their predicted output as a working memory. However, such ‘show-your-work’ writing style is highly unusual in the original natural language distribution they are trained to imitate, so they will not do so by default without a prompt steering them towards it; they instead try to emit the answer immediately, which is impossible given their feedforward limitation, and so they guess incorrectly.