Zero-Shot CoT — laranevans.com

Zero-shot chain of thought is the practice of eliciting multi-step reasoning by appending a single instruction to the user's question, with no worked examples. The canonical instruction is "Let's think step by step." Kojima et al. (2022), in Large Language Models are Zero-Shot Reasoners, showed that this cue alone improved performance on arithmetic, symbolic, and commonsense-reasoning benchmarks across multiple model families.

The result was surprising because it suggested that the multi-step reasoning capability lived inside the pretrained model already. The few-shot exemplars of chain-of-thought prompting (Wei et al. 2022) were sufficient but not necessary. A short natural-language nudge produced the same behavior on the tasks the paper measured.

The mechanism

The recipe is one line. The model is asked the question, then told to reason step by step before giving its final answer. A two-stage prompting form, where the first stage extracts the reasoning chain and the second stage extracts the answer from the chain, performed best in the original paper.

In production, modern instruction-tuned models often produce step-by-step reasoning when the task warrants it without any explicit cue. The cue still helps on borderline cases and on tasks where the model's default is to answer directly. It is also useful for evaluation: the same instruction across models reduces variance in the format of the reasoning trace.

When it helps

Zero-shot CoT shows the largest gains where multi-step reasoning is required and the model is capable enough to maintain coherence across the steps. The Kojima paper reports substantial improvements on the MultiArith arithmetic benchmark and on commonsense and symbolic-reasoning tasks. It performs less well on tasks where the bottleneck is knowledge rather than reasoning, the same limit that applies to few-shot CoT.

When it fails

The faithfulness concern from Turpin et al. (2023) applies to zero-shot CoT as much as to the few-shot variant. A fluent step-by-step rationale does not guarantee the reasoning is the model's actual decision pathway. The rationale rationalizes biased or incorrect answers.

Zero-shot CoT also degrades less gracefully than the few-shot form when the model's output format matters downstream. With exemplars, the prompt shows the desired structure directly. Without exemplars, the model picks its own structure, and downstream parsers must accommodate the variation.

Practical guidance

A few rules of thumb:

Try the task without the cue first on a recent instruction-tuned model. The model often reasons when warranted.
Use the cue when the task involves multi-step inference and the model's default is to answer directly.
Pair with structured outputs when the downstream system needs a parseable final answer. Have the model reason in prose, then return the final answer in a JSON field.
Use self-consistency over a single zero-shot CoT chain when accuracy matters more than cost.

Chain of Thought — the few-shot form this builds on.
Self-Consistency — sampling multiple chains, voting on the answer.
Prompt Engineering — the broader cluster.

The mechanism

When it helps

When it fails

Practical guidance

Related