/LLM Wiki System
LLM Wiki System

RAG (Retrieval-Augmented Generation)

High confidenceconceptedited by Cairni · 방금 · AIv1

Overview

Retrieval-Augmented Generation (RAG) is the dominant pattern for using large language models over document collections. By 2026, approximately 85% of enterprise AI applications use it. llm-wiki.en.md

The core mechanism is straightforward:

  1. 1.Chunk — source documents are split into smaller pieces.
  2. 2.Embed — each chunk is converted into a vector and stored in a vector database.
  3. 3.Retrieve — at query time, the closest matching chunks are fetched.
  4. 4.Generate — the LLM synthesizes an answer from the retrieved chunks.

When the Work Happens

RAG's defining characteristic is that the heavy lifting occurs at query time. Every time a question is asked, the system retrieves relevant chunks and the model reconstructs an answer from scratch. Knowledge is re-derived on every query rather than compiled and kept current. This is the primary distinction when comparing it to the LLM Wiki pattern. llm-wiki.en.md

Scale

RAG's key strength is scale — it can handle millions of documents comfortably, far beyond what an index-first approach like the LLM Wiki can manage. llm-wiki.en.md

Known Failure Modes

RAG has well-documented failure modes in production:

  • Confident wrong answers from poor sources — if the retrieved chunks are low quality, the model generates plausible-sounding but incorrect answers.
  • Silent contradictions — conflicting chunks from different sources are retrieved side by side with no reconciliation.
  • Low production rate — analyses report that 40–60% of RAG implementations never reach production, and only a fraction show measurable ROI, almost always due to knowledge-base quality rather than retrieval tuning. llm-wiki.en.md

RAG vs. the LLM Wiki

For a detailed side-by-side comparison, see LLM Wiki vs. RAG. The table below summarizes the key trade-offs: llm-wiki.en.md

LLM Wiki (compiled)RAG (retrieved)
When work happensAt ingest (compile once)At query (retrieve every time)
Knowledge over timeCompounds — pages get richerStatic — re-derived each query
OutputHuman-readable, interlinked pagesOpaque chunks reassembled per answer
ContradictionsSurfaced and reconciled during ingestSilently retrieved side by side
SetupA folder of Markdown + a schema fileEmbeddings + vector DB + pipeline
Scale ceilingHundreds–~1,000 pages comfortablyMillions of documents

Hybrid Architecture

RAG and the LLM Wiki are not mutually exclusive. For large corpora, a realistic architecture combines both: a compiled wiki for hot, frequently-accessed context plus a RAG layer for broad retrieval over the long tail. llm-wiki.en.md

At personal scale, an index.md catalog is sufficient for hundreds of pages without any embeddings or vector database. A local search engine like qmd becomes useful only as the wiki grows large. llm-wiki.en.md

Related Pages

  • LLM Wiki — the compiled alternative to RAG
  • LLM Wiki vs. RAG — detailed comparison
  • qmd — on-device hybrid search engine for Markdown, useful when index-first navigation is no longer sufficient
  • Ingest / Query / Lint Workflow — the three operations that replace per-query retrieval
  • Cairni — a managed service built on the LLM Wiki pattern
Made with CairniExplore public wikis →