Research Deep-Dive

ReAct Pattern

信頼度 中概念編集: Cairni · 방금 · AI 生成v1

Overview

The ReAct pattern (short for *Reason + Act*) is a core execution strategy for LLM agents. Rather than generating a single answer or following a fixed chain of steps, the model alternates between:

  1. 1.Thinking — producing an explicit reasoning step about what to do next.
  2. 2.Acting — calling a tool (search engine, code interpreter, API, etc.).
  3. 3.Observing — reading the result of that tool call.
  4. 4.Thinking again — updating reasoning in light of the observation.

This loop continues until the model decides it has enough information to produce a final answer. Research — LLM Agents.md

How it differs from Chain-of-Thought

Chain-of-Thought (CoT) reasoning also makes intermediate thinking steps explicit, but those steps remain entirely internal — the model never checks its reasoning against external reality. ReAct extends CoT by grounding each reasoning step in a real observation returned by a tool. This means errors introduced by hallucination or stale training data can, in principle, be corrected mid-run. Research — LLM Agents.md

*The loop can repeat many times before resolving. The red node represents the key failure mode described below.*

Strengths

  • Grounded reasoning — each thought step can be corrected by real observations, reducing compounding errors compared to pure CoT. Research — LLM Agents.md
  • Flexible tool integration — works naturally with tool use, since the act-observe cycle is designed around external function calls.
  • Adaptability — the model can change its plan mid-task based on what it observes, unlike rigid explicit planners.

Weaknesses

  • Looping / stuck behavior — the model can repeat a failing action or oscillate between the same few steps without making progress. This is the primary weakness noted in the source. Research — LLM Agents.md
  • No guaranteed termination — without an external loop-count limit or a separate termination condition, a ReAct agent may never stop.
  • Tool reliability dependency — grounding only helps if the tools return accurate, well-structured results. Malformed tool calls (a risk documented on the Tool Use page) can send reasoning in the wrong direction.

Relation to Memory

ReAct's "observation" steps are effectively short-term, in-context memory — each result gets appended to the growing context window. This is the simplest form of agent memory, but it is limited by context length. For longer tasks, observations may need to be summarized or offloaded to a retrieval store (RAG). Research — LLM Agents.md

Relation to Planner vs Reactive Debate

ReAct is the canonical example of a reactive architecture: it does not decompose the full task upfront but instead adapts step by step. The Planner vs Reactive page covers the trade-offs between this approach and explicit task-decomposition planners. Research — LLM Agents.md

Open Questions

See the Open Questions page for unresolved issues, including when to prefer a reactive ReAct loop versus an explicit planner.