RAG Patterns¶

RAG = Retrieval-Augmented Generation. The LLM doesn't rely only on its training data — at query time, the system retrieves relevant context from your own data and injects it into the prompt.

This page catalogs every RAG variant used in the portfolio plus a few comparison patterns that aren't (yet) used. For where the embeddings live, see Vector-Databases.md. For the embedding-side primitives (chunking, reranking, BM25), see AI-Concepts.md.

Quick portfolio map¶

RAG Type	App count	Most common stack
HyDE RAG	~17 apps	pgvector + text-embedding-3-small
Basic / Pipeline RAG	~5 apps	pgvector or Qdrant + text-embedding-3-small
Agentic RAG	3 apps	Mixed; tool-calling LLM
Adaptive 3-Tier	1 app (FamilyChat)	pgvector + cross-encoder
Graph RAG (3-tier)	1 app (GoGreen-DOC-AI)	Qdrant + LangChain + entity graph
Multi-Type RAG	1 app (MyPollingApp)	pgvector × 5 indexes
TF-IDF RAG	2 apps	scikit-learn / in-memory
BM25 RAG	1 app (WP-Plugin)	MySQL custom
Enterprise RAG (9 techniques)	1 app (OpenSentinel)	pgvector
No RAG (intentional)	5 apps	n/a

Naive RAG (a.k.a. Basic / Standard RAG)¶

What it is. The textbook three-step pipeline: embed the user query → look up the top-K most similar chunks in a vector DB → stuff those chunks into the LLM prompt as context.

Why we use it. Cheapest baseline. No reranker, no query rewriting, no graph. Good for narrow data sets where naive retrieval is "good enough."

Used in: - GoGreenPaperlessInitiative — pgvector + text-embedding-3-small. Embedding-status tracking and document-scoped search; no reranker. - MangyDogCoffee — pgvector + text-embedding-3-small. Knowledge-base indexing for menu / schedule / FAQ context fed into a voice agent. - Realestate-all-docker — MySQL JSON columns as the vector store + text-embedding-3-small. 11 distinct property RAG types, cosine similarity in SQL. - GoGreen-AI-Concierge — Qdrant + text-embedding-3-small. Pipeline-style: Crawlee web ingestion → batch embedding → citation tracking. (Listed below as "Pipeline RAG" because of the ingestion emphasis.)

Tradeoffs. Misses synonyms ("invoice" vs "bill"). Misses paraphrased queries. Top-K is a blunt instrument — the right answer can be at rank 11. Upgrade paths in this doc: HyDE, hybrid search, reranking, agentic.

Pipeline RAG¶

What it is. A naming convention some teams use for naive RAG that emphasizes the ingestion pipeline (crawl → parse → chunk → embed → store) rather than the retrieval pipeline. Functionally still naive on the retrieval side.

Used in: - GoGreen-AI-Concierge — Crawlee for crawling, Qdrant for the vectors, batch embedding, citation tracking on responses.

See also: Document-Processing.md for the ingestion side (Docling, PyMuPDF, Crawlee, etc.).

HyDE RAG (Hypothetical Document Embeddings)¶

What it is. Instead of embedding the user's question, you ask the LLM to write a hypothetical answer and embed that. The hypothetical answer is in the same shape as the documents you want to retrieve, so cosine similarity works much better.

Why we use it. Cheap (one extra LLM call), no model training, and dramatically improves recall when the query and the documents are written in different registers — e.g. "How do I cancel?" vs documents that say "Subscription termination procedure."

The trick. The hypothetical answer can be wrong; that doesn't matter, because you're only using its embedding. Final answers come from real retrieved docs.

Used in (most popular pattern in the portfolio): - Boomer_AI-Docker — pgvector + text-embedding-3-small. HyDE + LRU cache + 8 source types + metadata reranking. - Ecom-Sales / GogreenSellerAI — pgvector + text-embedding-3-small. HyDE + Claude-as-reranker + 17 source types + auto-indexing. - GoGreen-SmartForms — in-memory + text-embedding-3-small. HyDE + 4 source types (FORM_TEMPLATE, FIELD_MAPPING, EXTRACTED_DATA, CLASSIFICATION_RULE) + reranking. - GoGreenMarketing — in-memory + text-embedding-3-small. HyDE + query rewriting + agent-scoped retrieval. - GoGreenSourcingAI — pgvector + text-embedding-3-small. HyDE + reranking; sources: contract / supplier / RFQ. - NaggingWifeAI — pgvector + text-embedding-3-small. HyDE + 5 source types (FAMILY_MEMBER, REMINDER, PREFERENCE, CONVERSATION_HISTORY, IMPORTANT_DATE) + reranking. - Recruiting_AI-Docker — pgvector + text-embedding-3-small. HyDE + 15 source types + recruiting-specific reranker boosts (RESUME, JOB_POSTING, INTERVIEW_NOTES, …). - Sales_AI_App — pgvector + text-embedding-3-small. HyDE for sales-context retrieval. - Salon-Digital-Assistant — pgvector + text-embedding-3-small. HyDE + 5 source types (SERVICE, STYLIST, CLIENT_PREFERENCE, FAQ, BOOKING_HISTORY) + reranking. - SCO-Digital-Assistant — SQLite (JSON) + text-embedding-3-small. HyDE + 5 source types (ORGANIZATION_INFO, PROGRAM, FAQ, CALENDAR_EVENT, VOLUNTEER) + JS cosine similarity. - SellMeACar-Docker — SQLite (JSON) + text-embedding-3-small. HyDE + 9 source types (vehicle, sales_script, objection_response, …) + JS cosine. - SellMeAPen_CLCD-1 — Postgres (JSON) + text-embedding-3-small. HyDE + 9 source types + JS cosine. - SellMe_PRT-Docker — SQLite (JSON) + text-embedding-3-small. HyDE + 8 source types + JS cosine. - TimeSheetAI — pgvector + text-embedding-3-small. HyDE + timesheet intelligence. - Voting_NewAndImproved — MySQL (embeddings column) + text-embedding-3-small. HyDE used as the 9th of 9 retrieval strategies; 5 source types (POLL, VOTE_RESULT, FAQ, POLICY_DOCUMENT, DISCUSSION_THREAD) + reranking.

Tradeoffs. Adds latency (the HyDE call itself) and a small token cost. The Sell-Me family ships with the lowest-friction stack: SQLite (JSON) + JS cosine — works without Postgres.

Hybrid Search RAG¶

What it is. Run both dense vector search and sparse keyword search (BM25 / TF-IDF), then merge the results. The classic merger is Reciprocal Rank Fusion (RRF) but custom weighted schemes are fine.

Why we use it. Dense embeddings miss exact strings (SKUs, error codes, names). Sparse search misses semantic matches. Hybrid fixes both blind spots. This is the de-facto best-practice baseline at production scale in 2026.

Used in: - AI-Wordpress_Business-Directory — MySQL JSON + text-embedding-3-small. BM25 + vector hybrid, classified as Agentic RAG (8 tools) but the retriever underneath is hybrid. - GoGreen-DOC-AI — Qdrant + text-embedding-3-large (3072d). Hybrid BM25 + vector at the lowest tier (see Graph RAG below for the full 3-tier layout). - OpenSentinel — pgvector + text-embedding-3-small. Hybrid is one of nine retrieval techniques.

Reranked RAG¶

What it is. Retrieve a wide top-K (50–100) cheaply, then run a smaller, more accurate reranker model — a cross-encoder, Cohere Rerank, BGE-reranker, or even an LLM with a "rate this doc" prompt — to re-order them and pass only the top 5–10 to the generator.

Why we use it. Bi-encoder embeddings are fast but lossy. Cross-encoders score (query, document) pairs jointly, which is much more accurate but too slow to run over millions of docs. The two-stage retrieve-then-rerank pattern gets near-cross-encoder quality at near-bi-encoder speed.

Variants in the portfolio: - LLM-as-reranker (Ecom-Sales / GogreenSellerAI use Claude to rerank candidate chunks). - Metadata reranker (Boomer_AI scores by metadata fit on top of cosine). - Cross-encoder reranker (FamilyChat — see Adaptive 3-Tier below). - Domain-tuned reranker (Recruiting_AI applies recruiting-specific boosts: years_of_experience, skill match, etc.).

Used in (apps that have explicit reranking): Boomer_AI, Ecom-Sales, FamilyChat, GoGreen-DOC-AI, GoGreen-SmartForms, GoGreenSourcingAI, NaggingWifeAI, OpenSentinel, Recruiting_AI, Salon-Digital-Assistant, Voting_NewAndImproved.

Tradeoffs. Adds a latency hop. LLM-as-reranker is the simplest to ship but expensive over many candidates — limit to top 20.

Adaptive / Tiered RAG¶

What it is. The system decides which retrieval strategy to use per query, instead of running the same pipeline for every question.

Used in: - FamilyChat — Adaptive 3-Tier RAG. A query classifier picks one of three retrieval paths: shallow (recent messages only), mid (HyDE + vector), deep (full HyDE + cross-encoder reranking). pgvector + text-embedding-3-small.

Tradeoffs. Best for chat where most queries are trivial ("what time is dinner?") and only some need deep retrieval. The classifier itself has to be cheap — usually a small model or rules.

Graph RAG¶

What it is. Augment vector search with a knowledge graph. Entities and relationships extracted from documents become graph nodes/edges; retrieval traverses the graph to find connected facts, not just nearest neighbors. Strongly associated with Microsoft's GraphRAG paper (community detection + hierarchical summaries) but used here in a simpler 3-tier form.

Why we use it. When answering requires connecting facts across documents (multi-hop reasoning), pure vector search loses. Example: "Which forms reference the 2024 W-9 update?" needs the form ↔ regulation edge.

Used in: - GoGreen-DOC-AI — Graph RAG (3-tier). Tier 1: basic vector retrieval. Tier 2: LangChain RetrievalQA. Tier 3: entity-extracted knowledge graph traversal. Uses text-embedding-3-large (3072d) instead of the portfolio default 3-small (1536d) — paying for higher recall on dense legal/document text. Hybrid BM25 + vector at the base layer. 5-turn conversational memory.

Tradeoffs. Heaviest setup. Requires entity extraction (typically an LLM pass over every chunk). Worth it for document-heavy domains.

Agentic RAG¶

What it is. The LLM doesn't get a single retrieval — it gets tools (search, lookup, filter, fetch URL, query DB) and decides at runtime which to call, how often, and when it has enough to answer. Built on top of tool / function calling (see AI-Concepts.md).

Why we use it. Hard queries that need decomposition: "Compare the inspection requirements for these three vehicles in Pennsylvania" — needs three separate retrievals plus filtering, which a static pipeline can't do.

Used in: - AI-Wordpress_Business-Directory — MySQL JSON + text-embedding-3-small. 8 tools, BM25 + vector hybrid, cosine similarity. - Tutor_AI-Docker — pgvector + text-embedding-3-small. 10 tools, SM-2 spaced repetition, cognitive profiling, 3-iteration tool loop (LLM gets up to 3 reasoning rounds before final answer). - Recruiting_AI-Docker (also has HyDE) — uses agentic patterns when matching candidates to roles.

Tradeoffs. Latency (multiple LLM calls per query) and tokens. Can loop forever if you don't cap iterations — the 3-iteration cap in Tutor_AI is the right pattern.

Multi-Type RAG (parallel index pattern)¶

What it is. Multiple specialized RAG indexes co-exist; the router picks one (or fans out) per query. Different from Adaptive RAG: there it's the strategy that varies; here it's the corpus.

Used in: - MyPollingApp-Docker — pgvector + text-embedding-3-small. Five distinct RAG indexes: - KB-RAG — knowledge base - Poll-RAG — historical polls - Agent-Router-RAG — picks which sub-agent handles the query - Chat-Memory-RAG — conversational long-term memory - Vote-Text-RAG — free-text vote rationales

Tradeoffs. Excellent isolation (a poorly-formed FAQ doesn't pollute poll search) but every new corpus is more index plumbing.

TF-IDF RAG¶

What it is. Skip embeddings entirely. Use Term Frequency × Inverse Document Frequency vectors (essentially weighted word counts) and cosine similarity. The 1990s baseline.

Why anyone still uses it. Zero LLM dependency, zero vector DB, runs on the CPU, perfect when documents and queries share vocabulary (technical / structured text). Useful as a fallback when embedding APIs are down, or when the corpus is tiny.

Used in: - Automotive-Repair-Diagnosis-AI — In-memory TF-IDF. Custom tokenizer, stop-word removal, cosine similarity. Pinecone is configured but not used — the team decided pgvector + LangChain + TF-IDF was sufficient. - PolyMarketAI — SQLite + scikit-learn TF-IDF vectorizer + cosine. Falls back to OpenAI embeddings only when needed.

Tradeoffs. No semantic understanding. "Auto" and "vehicle" are unrelated to it. Use for narrow technical corpora.

BM25 RAG¶

What it is. Best Match 25 — the modern improvement over TF-IDF, used in Lucene/Elasticsearch/Solr. Adds saturation and document-length normalization. Sparse keyword retrieval with no embeddings.

Used in: - WP-Plugin — MySQL custom-indexed. BM25 over auto-extracted keywords from posts/pages/forms. Voice-front-end app: query rewriting at the keyword level rather than embedding level.

Tradeoffs. Same as TF-IDF — no semantics. But faster to update incrementally (add a doc = update keyword counts; no embedding API call). The right choice for a WordPress plugin where users may not configure an OpenAI key.

Enterprise RAG (multi-technique stack)¶

What it is. Internal name for "we do basically all of the above." Apps that combine HyDE + hybrid search + reranking + query rewriting + multi-step + graph retrieval into one configurable pipeline.

Used in: - OpenSentinel — pgvector + text-embedding-3-small. Documents 9 techniques: HyDE, hybrid search, graph retrieval, reranking, multi-step (decomposition), query rewriting, plus three more. Used as a backend that other portfolio apps consume — see "OpenSentinel Cross-App Integration" in Integrations_Audit.md.

Tradeoffs. Most code to maintain. Worth it only when downstream apps benefit from a single shared retrieval service.

Conversational / History-Aware RAG¶

What it is. Before retrieving, rewrite the user's latest message into a standalone query using the conversation history. "What about the next one?" → "What is the second item in the Q3 invoice list?" Then retrieve.

Used in (implicitly, as part of HyDE pipelines): FamilyChat (3-tier), most chat-based GoGreen apps, MyPollingApp (Chat-Memory-RAG sub-index). No app names this pattern explicitly — it's table stakes inside the others.

Self-Querying RAG (not used in portfolio)¶

What it is. The LLM looks at the user's question and writes structured filter conditions for the vector DB itself: {"price": {"<": 50}, "category": "audio"}. The vector search runs filtered.

Why we'd add it. The Sell-Me family already has structured product metadata — self-querying would let users say "show me red ones under $30" and skip embedding the constraints altogether.

What it is. Images and text share a joint embedding space (CLIP, SigLIP, OpenAI multimodal embeddings). Retrieve images by text query, or vice-versa.

Why we don't use it yet. GoGreen-DOC-AI does OCR + Tesseract + Docling and treats images as the text they extract. True multi-modal RAG would skip the OCR step. Candidate use case: GoGreen-SmartForms (form classification by visual layout).

Long-context "no RAG" (intentionally chosen by 5 apps)¶

What it is. With 200K+ context windows on Claude Sonnet 4.7 (1M context) and Gemini 1.5 Pro, "just put the whole document in the prompt" is a real engineering option. No embeddings, no vector DB, no chunking.

Apps that intentionally have no RAG: - AscendOne — desktop productivity app, no document corpus. - ChoreAndMoreTracker — chore data fits in a row. - EverythingBeer — static seed data. - Maximus — e-commerce storefront, products live in MySQL with structured queries. - PRT — same as Maximus.

Tradeoffs. Token cost grows linearly with corpus size, latency grows quadratically (attention). Fine when the corpus is small enough to fit and you have no need for citation precision. Wrong choice once the corpus exceeds ~500K tokens or queries are sensitive to noise.

Quick selection guide¶

If you need…	Pick this
Cheapest baseline, narrow data	Naive RAG (pgvector + 3-small)
Better recall than naive, no infra change	HyDE RAG
Don't lose exact terms (SKUs, codes)	Hybrid (vector + BM25)
Recall at scale, willing to spend a bit more	Add a reranker
Multi-hop reasoning across documents	Graph RAG
Hard queries needing decomposition	Agentic RAG
Several distinct corpora that shouldn't bleed	Multi-Type RAG
Tiny corpus, no embedding budget	TF-IDF or BM25
Corpus < ~200K tokens, citation not critical	No RAG, long-context