Research Deep-Dive

Retrieval-Augmented Generation (RAG)

信頼度 中概念編集: Cairni · 방금 · AI 生成v1

What is RAG?

Retrieval-Augmented Generation (RAG) is an approach that supplements an LLM's fixed context window with dynamically retrieved content from an external knowledge store. Rather than relying solely on what was baked into the model during training, a RAG system queries a store at inference time — typically using vector similarity search — and injects the retrieved passages into the prompt before the model generates a response.

In the context of LLM agents, RAG is the dominant pattern for implementing long-term memory. The agent's short-term memory is limited to the context window; RAG is how agents reach beyond it. Research — LLM Agents.md

How it fits into agent memory

The research notes characterize most "agent memory" implementations as essentially retrieval + summarization — a claim worth taking seriously:

*"Most 'agent memory' is just retrieval + summarization."*

This means the apparent sophistication of an agent's recall is often reducible to a well-tuned retrieval pipeline feeding a summarization step, rather than any deeper form of persistent understanding. See Agent Memory for a fuller treatment of the short-term vs. long-term distinction. Research — LLM Agents.md

How retrieval works (typical pipeline)

The core loop: query → embed → retrieve → inject → generate.

Known limitations

The research notes flag two specific weaknesses of naive RAG:

  1. 1.Misses temporal relationships — vector similarity is semantic, not temporal. An event that happened *recently* is not privileged over an older but semantically closer event.
  2. 2.Misses structural relationships — graph-like or hierarchical dependencies between facts are flattened into independent embedding vectors, losing relational context.

These gaps motivate interest in hybrid approaches (e.g., combining vector search with knowledge graphs or temporal indices). Research — LLM Agents.md

Contradiction: are embeddings alone sufficient?

The notes explicitly flag a disagreement between two perspectives:

모순/충돌AI · 출처 클릭
Naive vector search (embeddings alone) misses temporal and structural relationships in agent memory
One note claims embeddings alone suffice; another argues they miss temporal/structural context. The notes flag this as an open disagreement to resolve.
Research — LLM Agents.md

This tension is unresolved in the current sources. See Open Questions for the outstanding research question. Research — LLM Agents.md

Relationship to other agent concepts

ConceptHow RAG connects
Agent MemoryRAG is the primary mechanism for long-term memory
ReAct PatternA ReAct agent may call retrieval as one of its tool actions
Tool Use & Function CallingRetrieval can be exposed as a callable tool/function
Planner vs ReactiveBoth planner and reactive architectures may rely on RAG for world-state recall

Summary

RAG is a pragmatic and widely-used solution to the context-window bottleneck in LLM agents. Its core mechanic — embed, retrieve, inject — is simple, but its limitations around temporal and structural reasoning are a recognized open problem. Whether more sophisticated retrieval schemes are necessary, or whether embeddings alone are sufficient, remains contested. Research — LLM Agents.md