Reasoning and Acting
Reasoning and Acting (ReAct, by Yao et al. 2022) is an agentic prompting pattern that interleaves natural-language reasoning with environment actions. The model alternates between Thought, Action, and Observation steps, updating its reasoning after every action's result before deciding the next move. The pattern's filename uses the unfurled form because the abbreviation collides with the React JavaScript framework. The two are unrelated.
Yao et al. (2022), in ReAct: Synergizing Reasoning and Acting in Language Models, introduced the pattern and showed it outperformed reasoning-only (chain-of-thought) and action-only baselines on multi-hop question answering, fact verification, and interactive environment tasks. ReAct sits at the intersection of prompt engineering and agentic workflows: it is a prompt template that produces an agent loop.
The mechanism
A ReAct trajectory cycles through three roles until the task is solved or an iteration cap fires:
- Thought. A natural-language reasoning step. The model thinks about the task, plans what to do next, updates its belief state, or decides whether external information is needed.
- Action. A task-specific tool or environment command. For knowledge tasks this is often
Search[query]orLookup[entity]. For environment tasks it islook,open,take,goto. The action space is finite and defined by the host harness. - Observation. The result the environment or tool returns. The observation is appended to the conversation and read by the next Thought.
A concrete example from the question-answering setting:
Thought: I should find the person's birthplace.
Action: Search[Olivia Wilde birthplace]
Observation: Olivia Wilde was born in New York City...
Thought: The result lists New York City but I should verify the borough.
Action: Lookup[borough]
Observation: ...in Manhattan, New York...
Thought: Manhattan is the answer.
Action: Finish[Manhattan]
The key idea is that the model does not do all reasoning first and act once at the end. It updates its reasoning with fresh evidence after every action. Each Observation grounds the next Thought.
What the paper found
The paper evaluated ReAct on four benchmarks across two task families.
Knowledge tasks. HotpotQA (multi-hop question answering) and FEVER (fact verification). ReAct was compared against chain-of-thought (reasoning only), action-only baselines (no explicit reasoning traces), and a CoT + ReAct hybrid. On these tasks ReAct was competitive with CoT but produced more grounded, less hallucination-prone trajectories. The hybrid (CoT for internal reasoning, ReAct for external retrieval) often performed best.
Interactive tasks. ALFWorld (text-based household tasks) and WebShop (web-based shopping/navigation). ReAct was compared against imitation-learning and reinforcement-learning baselines trained on the same tasks. ReAct outperformed the trained baselines in the low-data prompting regime, with reported absolute success-rate gains of about 34% on ALFWorld and 10% on WebShop.
The interactive-task gains were the more striking result: a few-shot prompted model outperformed agents specifically trained for the environment.
What ReAct helps with
The paper identifies several failure modes that ReAct's interleaving mitigates:
- Hallucination. Pure CoT confidently invents facts. ReAct grounds reasoning in observations returned by tools.
- Error propagation. In linear reasoning, an early mistake cascades through later steps. ReAct corrects course after each observation.
- Exception handling. When a plan breaks, reasoning-only models often persist with the bad plan. ReAct detects the broken plan in the next Thought after the unexpected Observation.
- Missing grounding. The model might need external information but fail to seek it. The explicit Action step encourages fetching evidence.
- Getting stuck. In interactive environments, action-only agents repeat unhelpful actions. ReAct's Thought tied to each Action reduces this.
Where ReAct adds less value
ReAct is not uniformly superior to simpler patterns:
- Tasks the model handles in one pass. If the question has a direct answer the model knows, the loop adds latency and tokens without improving accuracy.
- Tasks where the tool interface is weak. A noisy or unreliable Observation channel poisons the Thought that follows. The pattern is only as good as the harness's tool implementation.
- Tasks with no external state. Pure math, pure code reasoning, or pure prose generation does not benefit from Observations. The Action step has nothing useful to do.
- Latency-bound user interactions. Each loop iteration is a model call plus a tool call. A user-facing chat with a 5-second budget cannot afford many iterations.
ReAct, plain agents, and other patterns
Agentic Workflows discusses the broader spectrum between predefined workflows and fully autonomous agent loops. ReAct is one specific shape inside that spectrum:
- An agent loop without ReAct emits a tool call, reads the result, emits the next call. The "reasoning" is implicit in the model's internal state.
- A ReAct loop makes the reasoning explicit in the prompt as a Thought step before each Action.
- A CoT-only prompt does all reasoning up front and emits one final action or answer.
- A plan-and-execute pattern produces a full plan first, then executes the plan. ReAct interleaves planning and execution at finer granularity.
The explicit Thought step has two practical effects beyond the paper's empirical gains. It makes the agent's behavior interpretable (the trajectory is human-readable). And it makes failure debuggable (the broken step's Thought usually reveals the wrong belief that led to the wrong Action).
Failure modes specific to ReAct
A few patterns worth carrying:
- Reasoning rationalization. The Thought step rationalizes a wrong Action rather than catching it. The chain-of-thought faithfulness concern from Turpin et al. 2023 extends to ReAct trajectories.
- Tool-result poisoning. An Observation containing attacker-controlled content (web search result, scraped page) is treated as evidence and changes the next Thought. The defense is the same as for any untrusted input: mark Observations as data, never as instructions.
- Loop without progress. The model produces plausible Thoughts and Actions for many iterations without converging. Iteration caps in the harness are the defense. See Agentic Systems for the harness-level concerns.
- Over-reliance on retrieval. ReAct produces a Search for every Thought even when the model already knows the answer. Wastes tokens and latency. Mitigation: prompt the model to use its prior knowledge when confident, retrieve when uncertain.
Practical guidance
A few rules of thumb:
- Use ReAct when the task involves external information or environment interaction and benefits from interleaved reasoning. Pure knowledge-from-weights tasks rarely need it.
- Define the action space tightly. A small set of well-named tools produces cleaner trajectories than a large catch-all toolbox.
- Treat the trajectory as the artifact, not only the final answer. A successful ReAct run leaves an inspectable reasoning trail.
- Cap iterations at the harness level. A model that does not converge in N steps usually does not converge at N+10 either.
- Pair with verification at the final step (run code, check a calculator, verify against a known-good source) when correctness matters more than the rationale.
Related
- Agentic Workflows — the broader pattern spectrum ReAct sits inside.
- Agentic Systems — the production-deployment surface for ReAct agents.
- Chain of Thought — the reasoning-only baseline ReAct extends.
- Tool Calling — the message-shape mechanic underneath the Action step.
- Prompt Injection — the threat model for Observation content.