Chain-of-Thought Reasoning
What Is Chain-of-Thought Reasoning?
Chain-of-thought (CoT) reasoning is a prompting technique in which a language model is guided — or trained — to emit explicit intermediate reasoning steps before producing a final answer. Rather than jumping directly from input to output, the model "thinks out loud," producing a sequence of logical steps that build toward the conclusion.
CoT is widely regarded as a precursor and building block for more advanced LLM agent behavior: it was among the first techniques to show that large language models could perform multi-step problem solving within a single forward pass. Research — LLM Agents.md
Chain-of-Thought vs. Grounded Reasoning
Pure CoT operates entirely within the model's context — the reasoning chain is generated from the model's parametric knowledge alone, with no external observations injected mid-sequence. This is both its strength and its key limitation:
- Strength: No external dependencies; the model can reason over anything in its training distribution.
- Weakness: Reasoning can "hallucinate" — produce plausible-looking chains that are factually wrong, because there is no mechanism to check intermediate conclusions against the real world.
This limitation motivates the ReAct pattern, which directly extends CoT by interleaving reasoning steps with tool calls and real observations. As noted in the research notes, ReAct "improves on pure chain-of-thought by grounding reasoning in real observations." Research — LLM Agents.md
Role in Agent Architectures
Chain-of-thought is not just a prompting trick — it is the cognitive substrate that most LLM agent designs depend on:
- ReAct uses CoT as its reasoning phase between actions.
- Explicit planners (see Planner vs Reactive Agent Architectures) use CoT-style decomposition to break a goal into sub-tasks before execution begins.
- Tool use decisions are often made within a CoT step, where the model reasons about *which* tool to call and *why*.
- Memory summarization (collapsing past context for retrieval) often relies on CoT-style summarization to compress information meaningfully.
Known Failure Modes
The research notes flag one important weakness directly relevant to CoT: when used in a loop (as in ReAct), the model "can loop or get stuck repeating a failing action." This suggests that an ungrounded or poorly checked reasoning chain can compound errors across turns rather than self-correct. Research — LLM Agents.md
More broadly, pure CoT offers no loop-breaking mechanism — if a reasoning chain leads to a dead end, the model has no external signal to detect this.
Relationship to Other Concepts
| Concept | Relationship to CoT |
|---|---|
| ReAct | Extends CoT by adding tool calls and observations between reasoning steps |
| Explicit planning | Uses CoT-style decomposition upfront; more predictable but brittle |
| Tool use | Tool selection decisions are made within CoT reasoning steps |
| Agent memory | Summarization of memory often relies on CoT-style generation |
| RAG | Retrieval can ground CoT steps with external documents |