Retrieval-Augmented Generation (RAG)
What is RAG?
Retrieval-Augmented Generation (RAG) is an approach that supplements an LLM's fixed context window with dynamically retrieved content from an external knowledge store. Rather than relying solely on what was baked into the model during training, a RAG system queries a store at inference time — typically using vector similarity search — and injects the retrieved passages into the prompt before the model generates a response.
In the context of LLM agents, RAG is the dominant pattern for implementing long-term memory. The agent's short-term memory is limited to the context window; RAG is how agents reach beyond it. Research — LLM Agents.md
How it fits into agent memory
The research notes characterize most "agent memory" implementations as essentially retrieval + summarization — a claim worth taking seriously:
*"Most 'agent memory' is just retrieval + summarization."*
This means the apparent sophistication of an agent's recall is often reducible to a well-tuned retrieval pipeline feeding a summarization step, rather than any deeper form of persistent understanding. See Agent Memory for a fuller treatment of the short-term vs. long-term distinction. Research — LLM Agents.md
How retrieval works (typical pipeline)
The core loop: query → embed → retrieve → inject → generate.
Known limitations
The research notes flag two specific weaknesses of naive RAG:
- 1.Misses temporal relationships — vector similarity is semantic, not temporal. An event that happened *recently* is not privileged over an older but semantically closer event.
- 2.Misses structural relationships — graph-like or hierarchical dependencies between facts are flattened into independent embedding vectors, losing relational context.
These gaps motivate interest in hybrid approaches (e.g., combining vector search with knowledge graphs or temporal indices). Research — LLM Agents.md
Contradiction: are embeddings alone sufficient?
The notes explicitly flag a disagreement between two perspectives:
This tension is unresolved in the current sources. See Open Questions for the outstanding research question. Research — LLM Agents.md
Relationship to other agent concepts
| Concept | How RAG connects |
|---|---|
| Agent Memory | RAG is the primary mechanism for long-term memory |
| ReAct Pattern | A ReAct agent may call retrieval as one of its tool actions |
| Tool Use & Function Calling | Retrieval can be exposed as a callable tool/function |
| Planner vs Reactive | Both planner and reactive architectures may rely on RAG for world-state recall |
Summary
RAG is a pragmatic and widely-used solution to the context-window bottleneck in LLM agents. Its core mechanic — embed, retrieve, inject — is simple, but its limitations around temporal and structural reasoning are a recognized open problem. Whether more sophisticated retrieval schemes are necessary, or whether embeddings alone are sufficient, remains contested. Research — LLM Agents.md