LLM Wiki System

RAG (Retrieval-Augmented Generation)

conceptedited by Cairni · 방금 · AIv1

Overview

Retrieval-Augmented Generation (RAG) is the dominant pattern for using large language models over document collections. By 2026, approximately 85% of enterprise AI applications use it. llm-wiki.en.md

The core mechanism is straightforward:

1.Chunk — source documents are split into smaller pieces.
2.Embed — each chunk is converted into a vector and stored in a vector database.
3.Retrieve — at query time, the closest matching chunks are fetched.
4.Generate — the LLM synthesizes an answer from the retrieved chunks.

When the Work Happens

RAG's defining characteristic is that the heavy lifting occurs at query time. Every time a question is asked, the system retrieves relevant chunks and the model reconstructs an answer from scratch. Knowledge is re-derived on every query rather than compiled and kept current. This is the primary distinction when comparing it to the LLM Wiki pattern. llm-wiki.en.md

Scale

RAG's key strength is scale — it can handle millions of documents comfortably, far beyond what an index-first approach like the LLM Wiki can manage. llm-wiki.en.md

Known Failure Modes

RAG has well-documented failure modes in production:

Confident wrong answers from poor sources — if the retrieved chunks are low quality, the model generates plausible-sounding but incorrect answers.
Silent contradictions — conflicting chunks from different sources are retrieved side by side with no reconciliation.
Low production rate — analyses report that 40–60% of RAG implementations never reach production, and only a fraction show measurable ROI, almost always due to knowledge-base quality rather than retrieval tuning. llm-wiki.en.md

RAG vs. the LLM Wiki

For a detailed side-by-side comparison, see LLM Wiki vs. RAG. The table below summarizes the key trade-offs: llm-wiki.en.md

	LLM Wiki (compiled)	RAG (retrieved)
When work happens	At ingest (compile once)	At query (retrieve every time)
Knowledge over time	Compounds — pages get richer	Static — re-derived each query
Output	Human-readable, interlinked pages	Opaque chunks reassembled per answer
Contradictions	Surfaced and reconciled during ingest	Silently retrieved side by side
Setup	A folder of Markdown + a schema file	Embeddings + vector DB + pipeline
Scale ceiling	Hundreds–~1,000 pages comfortably	Millions of documents

Hybrid Architecture

RAG and the LLM Wiki are not mutually exclusive. For large corpora, a realistic architecture combines both: a compiled wiki for hot, frequently-accessed context plus a RAG layer for broad retrieval over the long tail. llm-wiki.en.md

At personal scale, an index.md catalog is sufficient for hundreds of pages without any embeddings or vector database. A local search engine like qmd becomes useful only as the wiki grows large. llm-wiki.en.md

LLM Wiki — the compiled alternative to RAG
LLM Wiki vs. RAG — detailed comparison
qmd — on-device hybrid search engine for Markdown, useful when index-first navigation is no longer sufficient
Ingest / Query / Lint Workflow — the three operations that replace per-query retrieval
Cairni — a managed service built on the LLM Wiki pattern