laranevans.com
Topics / AI / Prompt Engineering / Zero-Shot CoT

Zero-shot CoT (zero-shot chain of thought) elicits multi-step reasoning by appending one instruction to the question, with no worked examples. The canonical instruction is "Let's think step by step." Kojima et al. (2022), in Large Language Models are Zero-Shot Reasoners, showed that this cue alone improved performance on arithmetic, symbolic, and commonsense-reasoning benchmarks across multiple model families.

The whole technique follows from one finding: the multi-step reasoning capability lived inside the pretrained model already. Chain-of-thought prompting (Wei et al. 2022) had supplied that capability through few-shot exemplars, examples of worked reasoning shown in the prompt. Zero-shot CoT removed the exemplars and recovered the same behavior on the tasks Kojima measured. The exemplars were sufficient but not necessary. A short natural-language nudge reached the reasoning the model had already learned.

That reframes what a chain-of-thought prompt does. The prompt does not teach the model to reason. It selects a reasoning mode the model already holds. Once you hold that, the two forms of CoT separate along a few axes, and the rest of this page reads off those axes.

Few-shot CoT and zero-shot CoT differ on where the reasoning comes from

Both forms produce a step-by-step rationale before the answer. They differ on what the prompt carries and what the model must supply on its own.

The reference spine

Axis Few-shot CoT Zero-shot CoT
What the prompt carries Worked exemplars of reasoning One instruction, no examples
Where the reasoning comes from Demonstrated in the prompt, imitated Already in the pretrained model, triggered by the cue
Output-format control Exemplars fix the structure directly Model picks its own structure
Prompt length Longer, scales with exemplar count One line
Canonical cue A few solved examples "Let's think step by step."

The spine is the contrast. The sections below read down its rows.

The mechanism is one line, applied in two stages

The recipe is one line. The model is asked the question, then told to reason step by step before giving its final answer. A two-stage prompting form, where the first stage extracts the reasoning chain and the second stage extracts the answer from the chain, performed best in the original paper.

In production, modern instruction-tuned models often produce step-by-step reasoning when the task warrants it without any explicit cue. The cue still helps on borderline cases and on tasks where the model's default is to answer directly. It also helps for evaluation. The same instruction across models reduces variance in the format of the reasoning trace.

Zero-shot CoT helps where reasoning is the bottleneck

Zero-shot CoT shows the largest gains where multi-step reasoning is required and the model is capable enough to hold coherence across the steps. The Kojima paper reports substantial improvements on the MultiArith arithmetic benchmark and on commonsense and symbolic-reasoning tasks. It performs less well where the bottleneck is knowledge rather than reasoning, the same limit that applies to few-shot CoT.

Two failure modes follow from the generating axis

Both ways zero-shot CoT breaks trace back to the same root: the prompt supplies no reasoning of its own.

Unfaithful rationales. The faithfulness concern from Turpin et al. (2023) applies to zero-shot CoT as much as to the few-shot variant. A fluent step-by-step rationale does not guarantee the reasoning is the model's actual decision pathway. The rationale rationalizes biased or incorrect answers. This is a property of the chain itself, so neither form escapes it.

Format drift. Zero-shot CoT degrades less gracefully than the few-shot form when the output format matters downstream. With exemplars, the prompt shows the desired structure directly. Without exemplars, the model picks its own structure, and downstream parsers must accommodate the variation. This is the output-format row of the spine, seen from the cost side.

Practical guidance

A few rules of thumb:

  • Try the task without the cue first on a recent instruction-tuned model. The model often reasons when warranted.
  • Use the cue when the task involves multi-step inference and the model's default is to answer directly.
  • Pair with structured outputs when the downstream system needs a parseable final answer. Have the model reason in prose, then return the final answer in a JSON field.
  • Use self-consistency over a single zero-shot CoT chain when accuracy matters more than cost.